pith. sign in

arxiv: 2511.22958 · v2 · submitted 2025-11-28 · 💻 cs.CV

Contrastive Heliophysical Image Pretraining for Solar Dynamics Observatory Records

Pith reviewed 2026-05-17 04:14 UTC · model grok-4.3

classification 💻 cs.CV
keywords contrastive learningsolar imagingpretrainingSolar Dynamics Observatoryflare classificationcross-modal translationAIAHMI
0
0 comments X

The pith

Contrastive pretraining on paired AIA-HMI solar images produces backbones that lead on flare classification and cross-modal translation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops SolarCHIP, a family of visual encoders pretrained with contrastive learning on multi-instrument observations from the Solar Dynamics Observatory. The method targets three solar-specific difficulties: views from different instruments, classes that overlap because the Sun evolves slowly, and high variation inside each class caused by sparse activity. It does so through a multi-granularity contrastive objective that aligns global tokens across simultaneous image pairs, keeps local patch features consistent at the same sky positions, and preserves spatial structure within each image. When these pretrained models are applied to turning HMI magnetograms into AIA images and to classifying full-disk flares, they reach higher accuracy than task-specific training or natural-image pretraining, with the largest gains appearing when labeled examples are few. A reader working on solar data would care because the released weights offer a reusable starting point that lowers the amount of new labels and compute needed for new solar imaging problems.

Core claim

SolarCHIP addresses three key challenges in solar imaging: multimodal sensing across AIA and HMI instruments, weak inter-class separability due to slow temporal evolution, and strong intra-class variability with sparse activity signals. Our pretraining framework employs a multi-granularity contrastive objective that jointly aligns global class tokens across co-temporal AIA-HMI pairs to enhance temporal discrimination, local patch tokens at fixed spatial indices to enforce position-consistent, modality-invariant features, and intra-sample patches across different spatial locations to preserve fine-grained spatial structure. We train both CNN- and Vision Transformer-based autoencoders and show

What carries the argument

Multi-granularity contrastive objective that jointly aligns global class tokens across co-temporal AIA-HMI pairs, local patch tokens at fixed spatial indices, and intra-sample patches across different spatial locations.

If this is right

  • SolarCHIP backbones reach state-of-the-art results on both cross-modal translation between HMI and AIA passbands and on full-disk flare classification.
  • Gains are largest in low-resource regimes where only limited labeled data is available for fine-tuning.
  • Ablation tests confirm that global token alignment, fixed-position patch alignment, and intra-sample patch alignment each add measurable discriminative power.
  • The released pretrained weights function as plug-and-play extractors that reduce overall compute and label requirements for new solar tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same alignment strategy could be tested on other instrument pairs or on time-series stacks to see whether temporal consistency improves further.
  • Better initial features might shorten the training time of operational space-weather models that ingest SDO data in real time.
  • The approach could be tried on ground-based solar observations that share similar slow-evolution and sparse-signal traits.

Load-bearing premise

The three contrastive alignment terms together overcome multimodal differences, weak class separation, and strong intra-class variability in SDO images without introducing new biases or needing heavy tuning.

What would settle it

Training a model from random initialization on the complete labeled flare dataset and obtaining equal or higher accuracy than the SolarCHIP-pretrained version would show the pretraining adds no lasting benefit.

Figures

Figures reproduced from arXiv: 2511.22958 by Bin Pan, Shiyu Shen, Taifeng Chai, Yang Huang, Zhe Gao.

Figure 1
Figure 1. Figure 1: Overview of SolarCHIP training pipeline. We first preprocess SDO images with calibration and augmentation. The preprocessed images are then fed into modality specific encoders and decoders to extract tokens and reconstruct inputs. We apply three contrastive learning objectives at different granularities: (a) class-level (b) patch-level (c) intra-sample contrastive learning. The similarity matrices illustra… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of downstream applications. We train a ControlNet with pretrained encoder for cross-modal translation between HMI and AIA (top). We append a classification head to the HMI encoder for solar flare recognition (bottom). the patch embedding, and the class token follows the standard [CLS] convention. We pretrain both CNN- and ViT-based autoencoders. The CNN pathway adopts a VAE-style design akin to th… view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of translated images from HMI to AIA. The top row shows the synthesized AIA images conditioned on the same HMI inputs, while the bottom row displays the corresponding ground truth AIA observations. place holder place holder place holder place holder place holder place holder place holder place holder place holder place holder place holder place holder place holder place holder place holder pl… view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of translated images from AIA to HMI. The leftmost column shows the ground truth HMI observation, while the right two rows display the synthesized HMI images conditioned on the corresponding AIA inputs. TABLE I: Comparisons of Translation Quality. We evaluate the translation quality by the MSE, PSNR, SSIM between translated image and ground truth. Smaller MSE and larger PSNR, SSIM indicate be… view at source ↗
Figure 5
Figure 5. Figure 5: Few-shot Adaption Results. We progressively reduce the number of labeled training samples to simulate low-resource scenarios, specifically using 100%, 50%, 20%, 10%, and 5% of the original training set. Both SolarCHIP-pretrained models and randomly initialized baselines are fine-tuned under these constraints. The performance is assessed using (a) the accuracy of all classes as well as the (b) ⩾M and (c) ⩾C… view at source ↗
read the original abstract

Deep learning has revolutionized solar image analysis, yet most approaches train task-specific encoders from scratch or rely on natural-image pretraining that ignores the unique characteristics of Solar Dynamics Observatory (SDO) data. We introduce SolarCHIP, a family of contrastively pretrained visual backbones tailored to multi-instrument SDO observations. SolarCHIP addresses three key challenges in solar imaging: multimodal sensing across AIA and HMI instruments, weak inter-class separability due to slow temporal evolution, and strong intra-class variability with sparse activity signals. Our pretraining framework employs a multi-granularity contrastive objective that jointly aligns (1) global class tokens across co-temporal AIA-HMI pairs to enhance temporal discrimination, (2) local patch tokens at fixed spatial indices to enforce position-consistent, modality-invariant features, and (3) intra-sample patches across different spatial locations to preserve fine-grained spatial structure. We train both CNN- and Vision Transformer-based autoencoders and demonstrate their effectiveness on two downstream tasks: cross-modal translation between HMI and AIA passbands via ControlNet, and full-disk flare classification. Experimental results show that SolarCHIP achieves state-of-the-art performance across both tasks, with particularly strong gains in low-resource settings where labeled data is limited. Ablation studies confirm that each contrastive component contributes essential discriminative capacity at different granularities. By publicly releasing pretrained weights and training code, we provide the heliophysics community with a practical, plug-and-play feature extractor that reduces computational requirements, improves label efficiency, and establishes a reusable foundation for diverse solar imaging applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces SolarCHIP, a family of contrastively pretrained visual backbones for multi-instrument SDO observations. It employs a multi-granularity contrastive objective that aligns global class tokens across co-temporal AIA-HMI pairs, enforces local patch consistency at fixed spatial indices, and preserves intra-sample spatial structure. The work claims state-of-the-art results on cross-modal translation (via ControlNet) and full-disk flare classification, with particularly strong gains in low-resource labeled-data regimes, supported by ablation studies on component contributions.

Significance. If the reported gains are shown to arise from the pretraining rather than data artifacts, this would deliver a reusable, domain-specific feature extractor that improves label efficiency for solar imaging tasks and better handles multimodal sensing and intra-class variability. The public release of pretrained weights and training code is a clear practical strength for the heliophysics community.

major comments (2)
  1. [§4 (Experiments)] §4 (Experiments) and associated data description: The manuscript does not explicitly state the train/test partitioning protocol for the flare classification task. Given the abstract's emphasis on 'slow temporal evolution' causing 'weak inter-class separability' and the reliance on co-temporal AIA-HMI pairs, any non-strict chronological split (e.g., random per-image or per-day without a multi-day buffer) risks temporal leakage that could inflate both absolute performance and the delta versus baselines in the low-label regime.
  2. [Ablation studies] Ablation studies (likely §4.3 or Table 3): While each contrastive component is stated to contribute discriminative capacity, the reported metrics lack error bars, multiple random seeds, or statistical tests. This makes it difficult to confirm that the observed improvements are robust rather than sensitive to hyperparameter choices or particular data folds.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'both CNN- and Vision Transformer-based autoencoders' should clarify whether these architectures are used only for pretraining or also as downstream feature extractors.
  2. [Methods] Notation: Ensure uniform terminology for 'global class token', 'local patch tokens', and 'intra-sample patches' when describing the three contrastive terms across methods and experiments sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which have helped us improve the clarity and rigor of the manuscript. We address each major comment point by point below, indicating the specific revisions we will incorporate.

read point-by-point responses
  1. Referee: [§4 (Experiments)] §4 (Experiments) and associated data description: The manuscript does not explicitly state the train/test partitioning protocol for the flare classification task. Given the abstract's emphasis on 'slow temporal evolution' causing 'weak inter-class separability' and the reliance on co-temporal AIA-HMI pairs, any non-strict chronological split (e.g., random per-image or per-day without a multi-day buffer) risks temporal leakage that could inflate both absolute performance and the delta versus baselines in the low-label regime.

    Authors: We thank the referee for identifying this omission. The original manuscript described the data sources and task setup but did not provide an explicit statement of the chronological partitioning protocol. In the revised version we have added a dedicated paragraph in §4.1 that specifies a strict chronological split: training data are drawn exclusively from earlier time periods, with a multi-day temporal buffer separating the training and test sets to eliminate any possibility of leakage. This protocol was followed for all reported experiments, including the low-resource regimes, and directly respects the slow temporal evolution of solar phenomena highlighted in the abstract. We believe this addition fully resolves the concern while preserving the validity of the performance deltas. revision: yes

  2. Referee: [Ablation studies] Ablation studies (likely §4.3 or Table 3): While each contrastive component is stated to contribute discriminative capacity, the reported metrics lack error bars, multiple random seeds, or statistical tests. This makes it difficult to confirm that the observed improvements are robust rather than sensitive to hyperparameter choices or particular data folds.

    Authors: We agree that the absence of variability measures limits the strength of the ablation claims. In the revised manuscript we have re-executed the ablation suite across five independent random seeds, now reporting mean performance together with standard-deviation error bars in the updated Table 3. We have also added a brief statistical analysis (paired Wilcoxon signed-rank tests) confirming that the gains from each contrastive component remain significant across seeds. These additions demonstrate that the observed contributions are robust rather than artifacts of a single run or particular fold. revision: yes

Circularity Check

0 steps flagged

No circularity: pretraining framework is self-contained with external validation

full rationale

The paper presents SolarCHIP as a new contrastive pretraining approach using multi-granularity losses on SDO data, with downstream evaluation on cross-modal translation and flare classification. No equations, derivations, or performance claims reduce by construction to fitted parameters defined inside the paper, nor do they rely on self-citation chains or imported uniqueness theorems. The method is described as addressing specific solar imaging challenges through explicit objectives, and results are tied to empirical ablations and external tasks rather than tautological redefinitions. This matches the default case of a self-contained empirical contribution without load-bearing internal loops.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so detailed free parameters, axioms, and invented entities cannot be audited from the full text. The approach implicitly relies on standard assumptions of contrastive learning and the utility of co-temporal multi-instrument pairs.

axioms (2)
  • domain assumption Co-temporal AIA-HMI image pairs provide reliable alignment signals for learning modality-invariant features.
    Invoked in the description of the global class token alignment objective.
  • domain assumption Slow temporal evolution and sparse activity signals in solar data require explicit multi-granularity contrastive terms to achieve discriminative features.
    Stated as addressing the key challenges of weak inter-class separability and strong intra-class variability.

pith-pipeline@v0.9.0 · 5586 in / 1408 out tokens · 26903 ms · 2026-05-17T04:14:20.239721+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Our pretraining framework employs a multi-granularity contrastive objective that jointly aligns (1) global class tokens across co-temporal AIA-HMI pairs ... (2) local patch tokens at fixed spatial indices ... (3) intra-sample patches across different spatial locations

  • IndisputableMonolith/Foundation/ArrowOfTime.lean forward_accumulates unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We use the SDO archive spanning 2010-2024. Data from 2010-2020 constitute the training split and 2020-2024 the test split ... with no temporal overlap.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 4 internal anchors

  1. [1]

    Deep flare net (defn) model for solar flare prediction,

    N. Nishizuka, K. Sugiura, Y . Kubo, M. Den, and M. Ishii, “Deep flare net (defn) model for solar flare prediction,”The Astrophysical Journal, vol. 858, no. 2, p. 113, 2018

  2. [2]

    Flare index prediction with machine learning algorithms,

    A. Chen, Q. Ye, and J. Wang, “Flare index prediction with machine learning algorithms,”Solar Physics, vol. 296, no. 10, p. 150, 2021

  3. [3]

    Advancing solar flare prediction using deep learning with active region patches,

    C. Pandey, T. Adeyeha, J. Hong, R. A. Angryk, and B. Aydin, “Advancing solar flare prediction using deep learning with active region patches,” inJoint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2024, pp. 50–65. UNDER REVIEW 11

  4. [4]

    Multi-channel coronal hole detection with convolutional neural networks,

    R. Jarolim, A. Veronig, S. Hofmeister, S. Heinemann, M. Temmer, T. Podladchikova, and K. Dissauer, “Multi-channel coronal hole detection with convolutional neural networks,”Astronomy & Astrophysics, vol. 652, p. A13, 2021

  5. [5]

    Image processing methods for coronal hole segmentation, matching, and map classification,

    V . Jatla, M. S. Pattichis, and C. N. Arge, “Image processing methods for coronal hole segmentation, matching, and map classification,”IEEE Transactions on Image Processing, vol. 29, pp. 1641–1653, 2019

  6. [6]

    Enhancing sdo/hmi images using deep learning,

    C. D. Baso and A. A. Ramos, “Enhancing sdo/hmi images using deep learning,”Astronomy & Astrophysics, vol. 614, p. A5, 2018

  7. [7]

    Super-resolution of sdo/hmi magnetograms using novel deep learning methods,

    S. Rahman, Y .-J. Moon, E. Park, A. Siddique, I.-H. Cho, and D. Lim, “Super-resolution of sdo/hmi magnetograms using novel deep learning methods,”The Astrophysical Journal Letters, vol. 897, no. 2, p. L32, 2020

  8. [8]

    A foundation model for the solar dynamics observatory.arXiv preprint arXiv:2410.02530, 2024

    J. Walsh, D. G. Gass, R. R. Pollan, P. J. Wright, R. Galvez, N. Kas- manoff, J. Naradowsky, A. Spalding, J. Parr, and A. G. Baydin, “A foundation model for the solar dynamics observatory,”arXiv preprint arXiv:2410.02530, 2024

  9. [9]

    Ai foundation model for heliophysics: Applications, design, and implementation,

    S. Roy, T. Singh, M. Freitag, J. Schmude, R. Lal, D. Hegde, S. Ranjan, A. Lin, V . Gaur, E. E. V oset al., “Ai foundation model for heliophysics: Applications, design, and implementation,”Nature Astronomy, 2024

  10. [10]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

  11. [11]

    Learning transferable visual models from natural language supervision,

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PmLR, 2021, pp. 8748–8763

  12. [12]

    DINOv3

    O. Siméoni, H. V . V o, M. Seitzer, F. Baldassarre, M. Oquab, C. Jose, V . Khalidov, M. Szafraniec, S. Yi, M. Ramamonjisoaet al., “Dinov3,” arXiv preprint arXiv:2508.10104, 2025

  13. [13]

    Self-supervised learning with swin transformers,

    Z. Xie, Y . Lin, Z. Yao, Z. Zhang, Q. Dai, Y . Cao, and H. Hu, “Self-supervised learning with swin transformers,”arXiv preprint arXiv:2105.04553, 2021

  14. [14]

    Segment anything,

    A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4015–4026

  15. [15]

    Deep transfer learning for image classification: a survey,

    J. Plested and T. Gedeon, “Deep transfer learning for image classification: a survey,”arXiv preprint arXiv:2205.09904, 2022

  16. [16]

    Visual instruction tuning,

    H. Liu, C. Li, Q. Wu, and Y . J. Lee, “Visual instruction tuning,”Advances in neural information processing systems, vol. 36, pp. 34 892–34 916, 2023

  17. [17]

    In-flight cross-calibration of hrieuv/eui and aia/sdo,

    S. Shestov, A. Zhukov, F. Auchère, D. Berghmans, and J. Loicq, “In-flight cross-calibration of hrieuv/eui and aia/sdo,”Astronomy & Astrophysics, vol. 699, p. A7, 2025

  18. [18]

    Solar force-free magnetic fields,

    T. Wiegelmann and T. Sakurai, “Solar force-free magnetic fields,”Living Reviews in Solar Physics, vol. 18, no. 1, p. 1, 2021

  19. [19]

    Multichannel autocalibration for the atmospheric imaging assembly using machine learning,

    L. F. Dos Santos, S. Bose, V . Salvatelli, B. Neuberg, M. C. Cheung, M. Janvier, M. Jin, Y . Gal, P. Boerner, and A. G. Baydin, “Multichannel autocalibration for the atmospheric imaging assembly using machine learning,”Astronomy & Astrophysics, vol. 648, p. A53, 2021

  20. [20]

    On the variation of solar coronal rotation using sdo/aia observations,

    J. Sharma, B. Kumar, A. K. Malik, and H. O. Vats, “On the variation of solar coronal rotation using sdo/aia observations,”Monthly Notices of the Royal Astronomical Society, vol. 492, no. 4, pp. 5391–5398, 2020

  21. [21]

    Variation in solar differential rotation and activity in the period 1964–2016 determined by the kanzelhöhe data set,

    I. P. Beljan, R. Jurdana-Šepi ´c, T. Jurki ´c, R. Brajša, I. Skoki ´c, D. Sudar, D. Ruždjak, D. Hržina, W. Pötzi, A. Hanslmeieret al., “Variation in solar differential rotation and activity in the period 1964–2016 determined by the kanzelhöhe data set,”Astronomy & astrophysics, vol. 663, p. A24, 2022

  22. [22]

    How to train your flare prediction model: Revisiting robust sampling of rare events,

    A. Ahmadzadeh, B. Aydin, M. K. Georgoulis, D. J. Kempton, S. S. Mahajan, and R. A. Angryk, “How to train your flare prediction model: Revisiting robust sampling of rare events,”The Astrophysical Journal Supplement Series, vol. 254, no. 2, p. 23, 2021

  23. [23]

    Factors that determine the power-law index of an energy distribution of solar flares,

    T. Kawai and S. Imada, “Factors that determine the power-law index of an energy distribution of solar flares,”The Astrophysical Journal, vol. 931, no. 2, p. 113, 2022

  24. [24]

    High- resolution image synthesis with latent diffusion models,

    R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 684–10 695

  25. [25]

    Adding conditional control to text-to-image diffusion models,

    L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 3836–3847

  26. [26]

    The solar dynamics observatory (sdo),

    W. D. Pesnell, B. J. Thompson, and P. Chamberlin, “The solar dynamics observatory (sdo),” inThe solar dynamics observatory. Springer, 2012, pp. 3–15

  27. [27]

    Solar stereoscopy with stereo/euvi a and b spacecraft from small (6) to large (170) spacecraft separation angles,

    M. J. Aschwanden, J.-P. Wülser, N. Nitta, and J. Lemen, “Solar stereoscopy with stereo/euvi a and b spacecraft from small (6) to large (170) spacecraft separation angles,”Solar Physics, vol. 281, no. 1, pp. 101–119, 2012

  28. [28]

    The helioseismic and magnetic imager (hmi) investigation for the solar dynamics observatory (sdo),

    P. H. Scherrer, J. Schou, R. Bush, A. Kosovichev, R. Bogart, J. Hoeksema, Y . Liu, T. Duvall Jr, J. Zhao, A. Titleet al., “The helioseismic and magnetic imager (hmi) investigation for the solar dynamics observatory (sdo),”Solar Physics, vol. 275, no. 1, pp. 207–227, 2012

  29. [29]

    Design and ground calibration of the helioseismic and magnetic imager (hmi) instrument on the solar dynamics observatory (sdo),

    J. Schou, P. H. Scherrer, R. I. Bush, R. Wachter, S. Couvidat, M. C. Rabello-Soares, R. S. Bogart, J. Hoeksema, Y . Liu, T. Duvall Jret al., “Design and ground calibration of the helioseismic and magnetic imager (hmi) instrument on the solar dynamics observatory (sdo),”Solar Physics, vol. 275, no. 1, pp. 229–259, 2012

  30. [30]

    Benchmarking atomic data for astrophysics: a first look at the soft x-ray lines,

    G. Del Zanna, “Benchmarking atomic data for astrophysics: a first look at the soft x-ray lines,”Astronomy & Astrophysics, vol. 546, p. A97, 2012

  31. [31]

    The interface region imaging spectrograph (iris),

    B. De Pontieu, A. Title, J. Lemen, G. Kushner, D. Akin, B. Allard, T. Berger, P. Boerner, M. Cheung, C. Chouet al., “The interface region imaging spectrograph (iris),”Solar Physics, vol. 289, no. 7, pp. 2733– 2779, 2014

  32. [32]

    The helioseismic and magnetic imager (hmi) vector magnetic field pipeline: Sharps–space-weather hmi active region patches,

    M. G. Bobra, X. Sun, J. T. Hoeksema, M. Turmon, Y . Liu, K. Hayashi, G. Barnes, and K. Leka, “The helioseismic and magnetic imager (hmi) vector magnetic field pipeline: Sharps–space-weather hmi active region patches,”Solar Physics, vol. 289, no. 9, pp. 3549–3578, 2014

  33. [33]

    The sunpy project: Open source development and status of the version 1.0 core package,

    W. T. Barnes, M. G. Bobra, S. D. Christe, N. Freij, L. A. Hayes, J. Ireland, S. Mumford, D. Perez-Suarez, D. F. Ryan, A. Y . Shihet al., “The sunpy project: Open source development and status of the version 1.0 core package,”The Astrophysical Journal, vol. 890, no. 1, p. 68, 2020

  34. [34]

    aiapy: A python package for analyzing solar euv image data from aia,

    W. Barnes, M. C. Cheung, M. G. Bobra, P. Boerner, G. Chintzoglou, D. Leonard, S. Mumford, N. Padmanabhan, A. Shih, N. Shirmanet al., “aiapy: A python package for analyzing solar euv image data from aia,” Journal of Open Source Software (JOSS), vol. 5, no. 55, pp. 2801–2801, 2020

  35. [35]

    Solar flare prediction using sdo/hmi vector magnetic field data with a machine-learning algorithm,

    M. G. Bobra and S. Couvidat, “Solar flare prediction using sdo/hmi vector magnetic field data with a machine-learning algorithm,”The Astrophysical Journal, vol. 798, no. 2, p. 135, 2015

  36. [36]

    Knowledge-informed deep neural networks for solar flare forecasting,

    M. Li, Y . Cui, B. Luo, X. Ao, S. Liu, J. Wang, S. Li, C. Du, X. Sun, and X. Wang, “Knowledge-informed deep neural networks for solar flare forecasting,”Space weather, vol. 20, no. 8, p. e2021SW002985, 2022

  37. [37]

    Forecasting solar flares with a transformer network,

    K. Pelkum Donahue and F. Inceoglu, “Forecasting solar flares with a transformer network,”Frontiers in Astronomy and Space Sciences, vol. 10, p. 1298609, 2024

  38. [38]

    Segmentation of coronal holes in solar disc images with a convolutional neural network,

    E. A. Illarionov and A. G. Tlatov, “Segmentation of coronal holes in solar disc images with a convolutional neural network,”Monthly Notices of the Royal Astronomical Society, vol. 481, no. 4, pp. 5014–5021, 2018

  39. [39]

    A community data set for comparing automated coronal hole detection schemes,

    M. A. Reiss, K. Muglach, E. Mason, E. E. Davies, S. Chakraborty, V . Delouille, C. Downs, T. M. Garton, J. A. Grajeda, A. Hamadaet al., “A community data set for comparing automated coronal hole detection schemes,”The Astrophysical Journal Supplement Series, vol. 271, no. 1, p. 6, 2024

  40. [40]

    Generation of solar uv and euv images from sdo/hmi magnetograms by deep learning,

    E. Park, Y .-J. Moon, J.-Y . Lee, H. Lee, D. Lim, G. Shin, T. Kimet al., “Generation of solar uv and euv images from sdo/hmi magnetograms by deep learning,”The Astrophysical Journal Letters, vol. 884, no. 1, p. L23, 2019

  41. [41]

    Image desaturation for sdo/aia using deep learning,

    X. Yu, L. Xu, and Y . Yan, “Image desaturation for sdo/aia using deep learning,”Solar Physics, vol. 296, no. 3, p. 56, 2021

  42. [42]

    Improving the spatial resolution of sdo/hmi transverse and line-of-sight magnetograms using gst/niris data with machine learning,

    C. Xu, Y . Xu, J. T. Wang, Q. Li, and H. Wang, “Improving the spatial resolution of sdo/hmi transverse and line-of-sight magnetograms using gst/niris data with machine learning,”Astronomy & Astrophysics, vol. 697, p. A110, 2025

  43. [43]

    Solaris: A foundation model of the sun,

    H. A. Majid, P. Sittoni, and F. Tudisco, “Solaris: A foundation model of the sun,”arXiv preprint arXiv:2411.16339, 2024

  44. [44]

    et al.: 2025, Surya: Foundation Model for Heliophysics, arXiv:2508.14112 [astro-ph.SR], submitted

    S. Roy, J. Schmude, R. Lal, V . Gaur, M. Freitag, J. Kuehnert, T. van Kessel, D. V . Hegde, A. Muñoz-Jaramillo, J. Jakubiket al., “Surya: Foundation model for heliophysics,”arXiv preprint arXiv:2508.14112, 2025

  45. [45]

    A simple framework for contrastive learning of visual representations,

    T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” inInternational conference on machine learning. PmLR, 2020, pp. 1597–1607

  46. [46]

    Momentum contrast for unsupervised visual representation learning,

    K. He, H. Fan, Y . Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9729–9738

  47. [47]

    Representation Learning with Contrastive Predictive Coding

    A. v. d. Oord, Y . Li, and O. Vinyals, “Representation learning with contrastive predictive coding,”arXiv preprint arXiv:1807.03748, 2018. UNDER REVIEW 12

  48. [48]

    Unsupervised learning of visual features by contrasting cluster assign- ments,

    M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised learning of visual features by contrasting cluster assign- ments,”Advances in neural information processing systems, vol. 33, pp. 9912–9924, 2020

  49. [49]

    Emerging properties in self-supervised vision transformers,

    M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 9650–9660

  50. [50]

    DINOv2: Learning Robust Visual Features without Supervision

    M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Noubyet al., “Dinov2: Learning robust visual features without supervision,”arXiv preprint arXiv:2304.07193, 2023

  51. [51]

    Masked autoencoders are scalable vision learners,

    K. He, X. Chen, S. Xie, Y . Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 000–16 009

  52. [52]

    iBOT: Image BERT Pre-Training with Online Tokenizer

    J. Zhou, C. Wei, H. Wang, W. Shen, C. Xie, A. Yuille, and T. Kong, “ibot: Image bert pre-training with online tokenizer,”arXiv preprint arXiv:2111.07832, 2021

  53. [53]

    Lit: Zero-shot transfer with locked-image text tuning,

    X. Zhai, X. Wang, B. Mustafa, A. Steiner, D. Keysers, A. Kolesnikov, and L. Beyer, “Lit: Zero-shot transfer with locked-image text tuning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 18 123–18 133

  54. [54]

    Scaling up visual and vision-language representation learning with noisy text supervision,

    C. Jia, Y . Yang, Y . Xia, Y .-T. Chen, Z. Parekh, H. Pham, Q. Le, Y .-H. Sung, Z. Li, and T. Duerig, “Scaling up visual and vision-language representation learning with noisy text supervision,” inInternational conference on machine learning. PMLR, 2021, pp. 4904–4916

  55. [55]

    Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data,

    O. Manas, A. Lacoste, X. Giró-i Nieto, D. Vazquez, and P. Rodriguez, “Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9414–9423

  56. [56]

    Croma: Remote sensing representa- tions with contrastive radar-optical masked autoencoders,

    A. Fuller, K. Millard, and J. Green, “Croma: Remote sensing representa- tions with contrastive radar-optical masked autoencoders,”Advances in Neural Information Processing Systems, vol. 36, pp. 5506–5538, 2023

  57. [57]

    Propagate yourself: Exploring pixel-level consistency for unsupervised visual representation learning,

    Z. Xie, Y . Lin, Z. Zhang, Y . Cao, S. Lin, and H. Hu, “Propagate yourself: Exploring pixel-level consistency for unsupervised visual representation learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 16 684–16 693

  58. [58]

    Dense contrastive learning for self-supervised visual pre-training,

    X. Wang, R. Zhang, C. Shen, T. Kong, and L. Li, “Dense contrastive learning for self-supervised visual pre-training,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 3024–3033

  59. [59]

    A deep learning framework for instrument-to-instrument translation of solar observation data,

    R. Jarolim, A. Veronig, W. Pötzi, and T. Podladchikova, “A deep learning framework for instrument-to-instrument translation of solar observation data,”Nature Communications, vol. 16, no. 1, p. 3157, 2025

  60. [60]

    Exploring the limits of synthetic creation of solar euv images via image-to-image translation,

    V . Salvatelli, L. F. Dos Santos, S. Bose, B. Neuberg, M. C. Cheung, M. Janvier, M. Jin, Y . Gal, and A. G. Baydin, “Exploring the limits of synthetic creation of solar euv images via image-to-image translation,” The Astrophysical Journal, vol. 937, no. 2, p. 100, 2022

  61. [61]

    Operational solar flare prediction model using deep flare net,

    N. Nishizuka, Y . Kubo, K. Sugiura, M. Den, and M. Ishii, “Operational solar flare prediction model using deep flare net,”Earth, Planets and Space, vol. 73, no. 1, p. 64, 2021

  62. [62]

    Explaining full-disk deep learning model for solar flare prediction using attribution methods,

    C. Pandey, R. A. Angryk, and B. Aydin, “Explaining full-disk deep learning model for solar flare prediction using attribution methods,” inJoint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2023, pp. 72–89