pith. sign in

arxiv: 2509.25210 · v4 · submitted 2025-09-21 · 💻 cs.LG · cs.AI· physics.ao-ph

STCast: Adaptive Boundary Alignment for Global and Regional Weather Forecasting

Pith reviewed 2026-05-18 15:07 UTC · model grok-4.3

classification 💻 cs.LG cs.AIphysics.ao-ph
keywords weather forecastingregional boundary alignmentattention mechanismmixture of expertsextreme event predictionensemble forecastingglobal-regional integration
0
0 comments X

The pith

STCast adapts regional boundaries from global weather fields using attention to improve forecast accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a data-driven method to move beyond fixed or cropped boundaries when combining global atmospheric models with finer regional forecasts. It claims that an attention mechanism can learn to shift and refine those boundaries dynamically by matching spatial patterns between the two scales. A separate module routes different months to specialized sub-networks so the model captures seasonal variations without forcing one set of weights to handle all conditions. If these adaptations work as described, the same network can deliver stronger results on both broad global predictions and localized detail, plus better handling of extreme events and ensemble spreads. A sympathetic reader would care because many current AI weather systems still rely on rigid region definitions that limit how well they generalize across seasons or geographies.

Core claim

STCast employs a Spatial-Aligned Attention mechanism to initialize and then iteratively refine regional boundaries by aligning global and regional spatial distributions according to learned attention patterns. It pairs this with a Temporal Mixture-of-Experts module that routes atmospheric variables from distinct months through a discrete Gaussian distribution to specialized experts. When evaluated on global forecasting, regional forecasting, extreme event prediction, and ensemble forecasting, the resulting model shows consistent gains over prior state-of-the-art approaches across all four tasks.

What carries the argument

Spatial-Aligned Attention (SAA) mechanism that aligns global and regional spatial distributions to initialize and adaptively refine boundaries based on attention-derived patterns, paired with Temporal Mixture-of-Experts (TMoE) that routes monthly data to specialized experts via a discrete Gaussian distribution.

If this is right

  • Regional forecasts can be generated without manually choosing fixed crop boundaries or solving separate physics boundary equations.
  • The same trained model can handle both global-scale and regional-scale outputs by learning to adjust the interface between them.
  • Monthly atmospheric patterns are captured more effectively when routed to distinct expert sub-networks instead of a single shared set of weights.
  • Extreme event and ensemble prediction tasks benefit from the same adaptive boundary and temporal routing components without task-specific redesign.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be tested on climate projection ensembles where boundary conditions vary more strongly across decades.
  • If the learned alignments prove stable, they might supply physically interpretable diagnostics for where global models lose skill at regional scales.
  • Extending the routing logic beyond months to other slow-varying drivers such as ENSO phases could further reduce the need for separate models per regime.

Load-bearing premise

The attention-derived alignment patterns reflect physically meaningful and generalizable boundary adjustments rather than dataset-specific correlations.

What would settle it

Performance gains would disappear or reverse on a held-out test set drawn from different years, different geographic domains, or different climate regimes while keeping the same training distribution.

Figures

Figures reproduced from arXiv: 2509.25210 by Hao Chen, Jie Zhang, Lei Bai, Song Guo, Tao Han.

Figure 1
Figure 1. Figure 1: (1) Illustration of 3 regional forecasting strategies: [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 1
Figure 1. Figure 1: (1), we compare three regional weather forecasting [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of our method. (a) The overall structure of low-resolution global weather forecasting, which includes input [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of Spatial-Aligned Attention. arApr Apr . [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of our method with 6 competitors on denormalized RMSE [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of 100-day prediction of Z500 [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of 6-hour regional weather prediction on MSL and U10 among Direct Training, OneForeCast, and Ours. [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Typhoon Track Assessment among Ours and ECMWF. (a), (b), (c) are the initial, predicted, and Real Mean Sea Level(MSL). (d) and (e) are the Typhoon Track Comparison in Typhoon Ewiniar and Yinxing, respectively. ticularly tropical cyclones, pose significant societal risks, necessitating accurate prediction capabilities (Wang et al. 2025a,b). To evaluate our method’s performance under such critical conditions… view at source ↗
Figure 10
Figure 10. Figure 10: 120-hour comparative analysis of RMSE ↓ across 10 data-driven models for four variables, including Z500, T850, T2M, and U10. Results are collected from EWMoE (Gan et al. 2025), WeatherGFT (Xu et al. 2024) and WeatherBench (Rasp et al. 2024) in https://sites.research.google/gr/weatherbench/deterministic-scores. Experiments Details Evaluation Metric In this work, we evaluate the forecasting performance betw… view at source ↗
Figure 11
Figure 11. Figure 11: 4-day comparative analysis of MDE (km) ↓ between ECMWF and Ours (STCast). For Typhoon Yinxing, the mean errors between ECMWF and STCast are 165.25 km and 67.1 km. For Typhoon Ewiniar, the mean errors between ECMWF and STCast are 138.82 km and 109.34 km. steps. These metrics are defined as follows: MDE = 1 N X N i=1 d(Ppred, Pobs), (19) d(P1, P2) = 2R · arcsin(√ a), (20) a = sin2 ( ∆ϕ 2 ) + cos ϕ1 · cos ϕ2… view at source ↗
Figure 12
Figure 12. Figure 12: 6-hour forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: 0.5-day forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: 1-day forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p017_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: 1.5-day forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p018_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: 2-day forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p018_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: 2.5-day forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p019_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: 3-day forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p019_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: 3.5-day forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p020_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: 4-day forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p020_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: 4.5-day forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p021_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: 5-day forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p021_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: 5.5-day forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p022_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: 6-day forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p022_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: 6.5-day forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p023_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: 7-day forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p023_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: 7.5-day forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p024_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: 8-day forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p024_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: 8.5-day forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p025_29.png] view at source ↗
Figure 30
Figure 30. Figure 30: 9-day forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p025_30.png] view at source ↗
Figure 31
Figure 31. Figure 31: 9.5-day forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p026_31.png] view at source ↗
Figure 32
Figure 32. Figure 32: 10-day forecast results of regional weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p026_32.png] view at source ↗
Figure 33
Figure 33. Figure 33: 6-hour forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p027_33.png] view at source ↗
Figure 34
Figure 34. Figure 34: 0.5-day forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p028_34.png] view at source ↗
Figure 35
Figure 35. Figure 35: 1-day forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p029_35.png] view at source ↗
Figure 36
Figure 36. Figure 36: 1.5-day forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p030_36.png] view at source ↗
Figure 37
Figure 37. Figure 37: 2-day forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p031_37.png] view at source ↗
Figure 38
Figure 38. Figure 38: 2.5-day forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p032_38.png] view at source ↗
Figure 39
Figure 39. Figure 39: 3-day forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p033_39.png] view at source ↗
Figure 40
Figure 40. Figure 40: 3.5-day forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p034_40.png] view at source ↗
Figure 41
Figure 41. Figure 41: 4-day forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p035_41.png] view at source ↗
Figure 42
Figure 42. Figure 42: 4.5-day forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p036_42.png] view at source ↗
Figure 43
Figure 43. Figure 43: 5-day forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p037_43.png] view at source ↗
Figure 44
Figure 44. Figure 44: 5.5-day forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p038_44.png] view at source ↗
Figure 45
Figure 45. Figure 45: 6-day forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p039_45.png] view at source ↗
Figure 46
Figure 46. Figure 46: 6.5-day forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p040_46.png] view at source ↗
Figure 47
Figure 47. Figure 47: 7-day forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p041_47.png] view at source ↗
Figure 48
Figure 48. Figure 48: 7.5-day forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p042_48.png] view at source ↗
Figure 49
Figure 49. Figure 49: 8-day forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p043_49.png] view at source ↗
Figure 50
Figure 50. Figure 50: 8.5-day forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p044_50.png] view at source ↗
Figure 51
Figure 51. Figure 51: 9-day forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p045_51.png] view at source ↗
Figure 52
Figure 52. Figure 52: 9.5-day forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p046_52.png] view at source ↗
Figure 53
Figure 53. Figure 53: 10-day forecast results of global weather among different models. [PITH_FULL_IMAGE:figures/full_fig_p047_53.png] view at source ↗
read the original abstract

To gain finer regional forecasts, many works have explored the regional integration from the global atmosphere, e.g., by solving boundary equations in physics-based methods or cropping regions from global forecasts in data-driven methods. However, the effectiveness of these methods is often constrained by static and imprecise regional boundaries, resulting in poor generalization ability. To address this issue, we propose Spatial-Temporal Weather Forecasting (STCast), a novel AI-driven framework for adaptive regional boundary optimization and dynamic monthly forecast allocation. Specifically, our approach employs a Spatial-Aligned Attention (SAA) mechanism, which aligns global and regional spatial distributions to initialize boundaries and adaptively refines them based on attention-derived alignment patterns. Furthermore, we design a Temporal Mixture-of-Experts (TMoE) module, where atmospheric variables from distinct months are dynamically routed to specialized experts using a discrete Gaussian distribution, enhancing the model's ability to capture temporal patterns. Beyond global and regional forecasting, we evaluate our STCast on extreme event prediction and ensemble forecasting. Experimental results demonstrate consistent superiority over state-of-the-art methods across all four tasks. Code: https://github.com/chenhao-zju/STCast

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes STCast, an AI-driven framework for weather forecasting that introduces a Spatial-Aligned Attention (SAA) module to adaptively align global and regional spatial distributions and refine boundaries based on attention patterns, along with a Temporal Mixture-of-Experts (TMoE) module that routes atmospheric variables from distinct months to specialized experts via a discrete Gaussian distribution. The model is evaluated on global and regional forecasting, extreme event prediction, and ensemble forecasting, with the central claim being consistent superiority over state-of-the-art methods across all four tasks.

Significance. If the adaptive boundary alignment proves robust and generalizable beyond the training distribution, the work could meaningfully advance data-driven regional weather forecasting by overcoming limitations of static boundaries in both physics-based and ML approaches, with potential benefits for extreme event and ensemble predictions.

major comments (2)
  1. [§3.1] §3.1 (SAA module description): The central generalization claim across the four tasks rests on the assumption that attention-derived alignment patterns produce physically meaningful and robust boundary adjustments. No analysis is provided (e.g., attention map visualizations compared to known meteorological fronts, or explicit distribution-shift experiments) to distinguish these patterns from dataset-specific statistical correlations, which directly bears on whether the reported gains will hold on independent test sets.
  2. [§4] §4 (Experimental results): While consistent outperformance is asserted, the section lacks reported details on dataset sizes, exact error metrics with confidence intervals, statistical significance tests, and full ablation controls for the SAA and TMoE components; without these, it is difficult to rule out that gains arise from hyperparameter tuning differences rather than the proposed modules.
minor comments (2)
  1. [§3.2] The notation for the discrete Gaussian distribution in the TMoE routing could be clarified with an explicit equation or pseudocode to aid reproducibility.
  2. [Figures] Figure captions for attention visualizations (if present) should explicitly state the color scale and what the highlighted regions represent in physical terms.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and outline the revisions planned for the manuscript.

read point-by-point responses
  1. Referee: [§3.1] §3.1 (SAA module description): The central generalization claim across the four tasks rests on the assumption that attention-derived alignment patterns produce physically meaningful and robust boundary adjustments. No analysis is provided (e.g., attention map visualizations compared to known meteorological fronts, or explicit distribution-shift experiments) to distinguish these patterns from dataset-specific statistical correlations, which directly bears on whether the reported gains will hold on independent test sets.

    Authors: We agree that additional evidence is required to substantiate that the attention patterns yield physically meaningful boundary adjustments rather than dataset-specific correlations. In the revised manuscript we will include visualizations of SAA attention maps overlaid on standard meteorological fields and compare them against documented fronts and boundaries. We will also add distribution-shift experiments using held-out years and geographically distinct regions to evaluate robustness outside the original training distribution. revision: yes

  2. Referee: [§4] §4 (Experimental results): While consistent outperformance is asserted, the section lacks reported details on dataset sizes, exact error metrics with confidence intervals, statistical significance tests, and full ablation controls for the SAA and TMoE components; without these, it is difficult to rule out that gains arise from hyperparameter tuning differences rather than the proposed modules.

    Authors: We acknowledge that the current experimental section would benefit from greater rigor. In the revision we will report exact training and test dataset sizes, present all primary metrics together with confidence intervals obtained from multiple random seeds, include statistical significance tests (paired t-tests) against baselines, and expand the ablation studies to fully isolate the contributions of SAA and TMoE while controlling for hyperparameter budgets. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ML framework with independent experimental validation

full rationale

The paper introduces STCast as a neural architecture combining Spatial-Aligned Attention for adaptive boundaries and Temporal Mixture-of-Experts for monthly routing. All load-bearing claims are empirical performance gains on global/regional forecasting, extremes, and ensembles, measured against external baselines on held-out data. No derivation, uniqueness theorem, or first-principles result is presented that reduces by construction to fitted parameters, self-citations, or renamed inputs. The SAA and TMoE modules are standard attention and MoE designs applied to weather tensors; their effectiveness is tested rather than assumed via internal redefinition. This is a self-contained empirical contribution with no detectable circular steps.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 2 invented entities

The framework relies on standard neural-network assumptions plus two new architectural modules whose effectiveness is asserted through empirical results rather than derived from first principles.

free parameters (2)
  • number of experts in TMoE
    Chosen to match the number of months or seasonal regimes; value not stated in abstract.
  • attention temperature or scaling factor in SAA
    Controls how sharply boundaries are aligned; typical learned or tuned hyperparameter.
axioms (2)
  • domain assumption Atmospheric fields at global and regional scales share sufficient spatial structure that attention can recover meaningful boundary adjustments.
    Invoked when SAA is introduced to initialize and refine boundaries.
  • domain assumption Monthly atmospheric variables are sufficiently distinct that routing them to separate experts improves temporal modeling.
    Basis for the discrete Gaussian routing in TMoE.
invented entities (2)
  • Spatial-Aligned Attention (SAA) module no independent evidence
    purpose: To align global and regional spatial distributions and adaptively refine boundaries.
    New architectural component introduced in the paper.
  • Temporal Mixture-of-Experts (TMoE) module no independent evidence
    purpose: To dynamically route monthly data to specialized experts.
    New architectural component introduced in the paper.

pith-pipeline@v0.9.0 · 5737 in / 1570 out tokens · 30958 ms · 2026-05-18T15:07:43.362464+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. SwAIther-Precip: Lead-Time-Aware Bias Correction Enables Kilometer-Scale Downscaling of Global AI Precipitation Forecasts over Switzerland

    physics.ao-ph 2026-05 unverdicted novelty 6.0

    SwAIther-Precip uses lead-time-conditioned U-Net bias correction followed by diffusion-based super-resolution to downscale AIFS forecasts, achieving 48% CRPS reduction and ~4 km effective resolution up to 5 days lead time.

  2. Generative 3D Gaussian Splatting for Arbitrary-ResolutionAtmospheric Downscaling and Forecasting

    cs.CV 2026-04 unverdicted novelty 6.0

    A generative 3D Gaussian splatting model with scale-aware attention enables unified arbitrary-resolution forecasting and downscaling of 87 atmospheric variables.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · cited by 2 Pith papers

  1. [1]

    , " * write output.state after.block = add.period write newline

    ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

  3. [3]

    Adamov, S.; Oskarsson, J.; Denby, L.; Landelius, T.; Hintz, K.; Christiansen, S.; Schicker, I.; Osuna, C.; Lindsten, F.; Fuhrer, O.; et al. 2025. Building Machine Learning Limited Area Models: Kilometer-Scale Weather Forecasting in Realistic Settings. arXiv preprint arXiv:2504.09340

  4. [4]

    Bauer, P.; Thorpe, A.; and Brunet, G. 2015. The quiet revolution of numerical weather prediction. Nature, 525(7567): 47--55

  5. [5]

    Bi, K.; Xie, L.; Zhang, H.; Chen, X.; Gu, X.; and Tian, Q. 2023. Accurate medium-range global weather forecasting with 3D neural networks. Nature, 619(7970): 533--538

  6. [6]

    P.; Lucic, A.; Stanley, M.; Brandstetter, J.; Garvan, P.; Riechert, M.; Weyn, J.; Dong, H.; Vaughan, A.; et al

    Bodnar, C.; Bruinsma, W. P.; Lucic, A.; Stanley, M.; Brandstetter, J.; Garvan, P.; Riechert, M.; Weyn, J.; Dong, H.; Vaughan, A.; et al. 2025. Aurora: A foundation model of the atmosphere. Nature, 641: 1180--1187

  7. [7]

    Bonev, B.; Kurth, T.; Hundt, C.; Pathak, J.; Baust, M.; Kashinath, K.; and Anandkumar, A. 2023. Spherical Fourier neural operators: learning stable dynamics on the sphere. In Proceedings of the 40th International Conference on Machine Learning

  8. [8]

    Chen, H.; Tao, H.; Song, G.; Zhang, J.; Yu, Y.; Dong, Y.; and Bai, L. 2025. VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

  9. [9]

    Chen, K.; Han, T.; Gong, J.; Bai, L.; Ling, F.; Luo, J.-J.; Chen, X.; Ma, L.; Zhang, T.; Su, R.; et al. 2023 a . Fengwu: Pushing the skillful global medium-range weather forecast beyond 10 days lead. arXiv preprint arXiv:2304.02948

  10. [10]

    Chen, L.; Zhong, X.; Zhang, F.; Cheng, Y.; Xu, Y.; Qi, Y.; and Li, H. 2023 b . FuXi: A cascade machine learning forecasting system for 15-day global weather forecast. npj Climate and Atmospheric Science, 6(1): 190

  11. [11]

    Cheon, M.; Choi, Y.-H.; Kang, S.-Y.; Choi, Y.; Lee, J.-G.; and Kang, D. 2024. Karina: An efficient deep learning model for global weather forecast. arXiv preprint arXiv:2403.10555

  12. [12]

    Dao, T.; Fu, D.; Ermon, S.; Rudra, A.; and R \'e , C. 2022. Flashattention: Fast and memory-efficient exact attention with io-awareness. In Advances in Neural Information Processing Systems, volume 35, 16344--16359

  13. [13]

    Gan, L.; Man, X.; Zhang, C.; and Shao, J. 2025. EWMoE: An effective model for global weather forecasting with mixture-of-experts. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, 210--218

  14. [14]

    Gao, Y.; Wu, H.; Shu, R.; Dong, H.; Xu, F.; Chen, R.; Yan, Y.; Wen, Q.; Hu, X.; Wang, K.; et al. 2025. OneForecast: A Universal Framework for Global and Regional Weather Forecasting. In Proceedings of the 42th International Conference on Machine Learning

  15. [15]

    Gao, Z.; Tan, C.; Wu, L.; and Li, S. Z. 2022. Simvp: Simpler yet better video prediction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 3170--3180

  16. [16]

    Gong, Y.; He, T.; Chen, M.; Wang, B.; Nie, L.; and Yin, Y. 2024. Spatio-temporal enhanced contrastive and contextual learning for weather forecasting. IEEE Transactions on Knowledge and Data Engineering, 36(8): 4260--4274

  17. [17]

    Han, T.; Guo, S.; Ling, F.; Chen, K.; Gong, J.; Luo, J.; Gu, J.; Dai, K.; Ouyang, W.; and Bai, L. 2024. Fengwu-ghr: Learning the kilometer-scale medium-range global weather forecasting. arXiv preprint arXiv:2402.00059

  18. [18]

    He, J.; Ji, J.; and Lei, M. 2024. Spatio-temporal transformer network with physical knowledge distillation for weather forecasting. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 819--828

  19. [19]

    Hersbach, H.; Bell, B.; Berrisford, P.; Hirahara, S.; Hor \'a nyi, A.; Mu \ n oz-Sabater, J.; Nicolas, J.; Peubey, C.; Radu, R.; Schepers, D.; et al. 2020. The ERA5 global reanalysis. Quarterly Journal of the Royal Meteorological Society, 146(730): 1999--2049

  20. [20]

    F.; Aditya, S

    Hidayatullah, A. F.; Aditya, S. K.; Gardini, S. T.; et al. 2019. Topic modeling of weather and climate condition on twitter using latent dirichlet allocation (LDA). In IOP Conference Series: Materials Science and Engineering, volume 482, 012033

  21. [21]

    Ji, J.; He, J.; Lei, M.; Wang, M.; and Tang, W. 2024. Spatio-temporal transformer network for weather forecasting. IEEE Transactions on Big Data, 11(2): 372--387

  22. [22]

    Kalnay, E. 2002. Atmospheric modeling, data assimilation and predictability. Cambridge: Cambridge University Press

  23. [23]

    Keisler, R. 2022. Forecasting global weather with graph neural networks. arXiv:2202.07575

  24. [24]

    Kurth, T.; Subramanian, S.; Harrington, P.; Pathak, J.; Mardani, M.; Hall, D.; Miele, A.; Kashinath, K.; and Anandkumar, A. 2023. FourCastNet: Accelerating Global High-Resolution Weather Forecasting Using Adaptive Fourier Neural Operators. In Proceedings of the Platform for Advanced Scientific Computing Conference, PASC '23

  25. [25]

    Lam, R.; Sanchez-Gonzalez, A.; Willson, M.; Wirnsberger, P.; Fortunato, M.; Alet, F.; Ravuri, S.; Ewalds, T.; Eaton-Rosen, Z.; Hu, W.; et al. 2023. Learning skillful medium-range global weather forecasting. Science, 382(6677): 1416--1421

  26. [26]

    B.; Azizzadenesheli, K.; liu, B.; Bhattacharya, K.; Stuart, A.; and Anandkumar, A

    Li, Z.; Kovachki, N. B.; Azizzadenesheli, K.; liu, B.; Bhattacharya, K.; Stuart, A.; and Anandkumar, A. 2021. Fourier Neural Operator for Parametric Partial Differential Equations. In International Conference on Learning Representations

  27. [27]

    Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; and Guo, B. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 10012--10022

  28. [28]

    Lu, X.; Yu, H.; Ying, M.; Zhao, B.; Zhang, S.; Lin, L.; Bai, L.; and Wan, R. 2021. Western North Pacific tropical cyclone database created by the China Meteorological Administration. Advances in Atmospheric Sciences, 38(4): 690--699

  29. [29]

    A.; Chow, F

    Lundquist, K. A.; Chow, F. K.; and Lundquist, J. K. 2010. An immersed boundary method for the weather research and forecasting model. Monthly Weather Review, 138(3): 796--817

  30. [30]

    Lynch, P. 2008. The origins of computer weather prediction and climate modeling. Journal of computational physics, 227(7): 3431--3444

  31. [31]

    Ma, M.; Xie, P.; Teng, F.; Wang, B.; Ji, S.; Zhang, J.; and Li, T. 2023. HiSTGNN: Hierarchical spatio-temporal graph neural network for weather forecasting. Information Sciences, 648: 119580

  32. [32]

    Magnusson, L.; Majumdar, S.; Emerton, R.; Richardson, D.; Alonso-Balmaseda, M.; Baugh, C.; Bechtold, P.; Bidlot, J.; Bonanni, A.; Bonavita, M.; et al. 2021. Tropical cyclone activities at ECMWF. ECMWF Technical Memoranda

  33. [33]

    Manabe, S.; and Bryan, K. 1969. Climate calculations with a combined ocean-atmosphere model. Journal of Atmospheric Sciences, 26(4): 786--789

  34. [34]

    Mani, A. 2012. Analysis and optimization of numerical sponge layers as a nonreflective boundary treatment. Journal of Computational Physics, 231(2): 704--716

  35. [35]

    N.; and Petroliagis, T

    Molteni, F.; Buizza, R.; Palmer, T. N.; and Petroliagis, T. 1996. The ECMWF ensemble prediction system: Methodology and validation. Quarterly journal of the royal meteorological society, 122(529): 73--119

  36. [36]

    K.; and Grover, A

    Nguyen, T.; Brandstetter, J.; Kapoor, A.; Gupta, J. K.; and Grover, A. 2023. ClimaX: A foundation model for weather and climate. In International Conference on Machine Learning

  37. [37]

    Nguyen, T.; Shah, R.; Bansal, H.; Arcomano, T.; Maulik, R.; Kotamarthi, R.; Foster, I.; Madireddy, S.; and Grover, A. 2025. Scaling transformer neural networks for skillful and reliable medium-range weather forecasting. In Advances in Neural Information Processing Systems, volume 37, 68740--68771

  38. [38]

    N.; Haugen, H

    Nipen, T. N.; Haugen, H. H.; Ingstad, M. S.; Nordhagen, E. M.; Salihi, A. F. S.; Tedesco, P.; Seierstad, I. A.; Kristiansen, J.; Lang, S.; Alexe, M.; et al. 2024. Regional data-driven weather modeling with a global stretched-grid. arXiv preprint arXiv:2409.02891

  39. [39]

    R.; El-Kadi, A.; Masters, D.; Ewalds, T.; Stott, J.; Mohamed, S.; Battaglia, P.; et al

    Price, I.; Sanchez-Gonzalez, A.; Alet, F.; Andersson, T. R.; El-Kadi, A.; Masters, D.; Ewalds, T.; Stott, J.; Mohamed, S.; Battaglia, P.; et al. 2025. Probabilistic weather forecasting with machine learning. Nature, 637(8044): 84--90

  40. [40]

    Rasp, S.; Hoyer, S.; Merose, A.; Langmore, I.; Battaglia, P.; Russell, T.; Sanchez-Gonzalez, A.; Yang, V.; Carver, R.; Agrawal, S.; et al. 2024. WeatherBench 2: A benchmark for the next generation of data-driven global weather models. Journal of Advances in Modeling Earth Systems, 16(6): e2023MS004019

  41. [41]

    Ritchie, H.; Temperton, C.; Simmons, A.; Hortal, M.; Davies, T.; Dent, D.; and Hamrud, M. 1995. Implementation of the semi-Lagrangian method in a high-resolution version of the ECMWF forecast model. Monthly Weather Review, 123(2): 489--514

  42. [42]

    Sabathier, M.; Pannekoucke, O.; Maget, V.; and Dahmen, N. 2023. Boundary conditions for the parametric Kalman filter forecast. Journal of Advances in Modeling Earth Systems, 15(10): e2022MS003462

  43. [43]

    Shazeer, N.; Mirhoseini, A.; Maziarz, K.; Davis, A.; Le, Q.; Hinton, G.; and Dean, J. 2017. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In International Conference on Learning Representations

  44. [44]

    Z.; Separovic, L.; and Yang, J

    Subich, C.; Husain, S. Z.; Separovic, L.; and Yang, J. 2025. Fixing the double penalty in data-driven weather forecasting through a modified spherical harmonic loss function. In Proceedings of the 42th International Conference on Machine Learning

  45. [45]

    Veillette, M.; Samsi, S.; and Mattioli, C. 2020. Sevir: A storm event imagery dataset for deep learning applications in radar and satellite meteorology. In Advances in Neural Information Processing Systems, volume 33, 22009--22019

  46. [46]

    Verma, Y.; Heinonen, M.; and Garg, V. 2024. Clim ODE : Climate Forecasting With Physics-informed Neural ODE s. In International Conference on Learning Representations

  47. [47]

    Wang, B.; Lu, J.; Yan, Z.; Luo, H.; Li, T.; Zheng, Y.; and Zhang, G. 2019. Deep Uncertainty Quantification: A Machine Learning Approach for Weather Forecasting. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2087–2095

  48. [48]

    Wang, X.; Chen, K.; Liu, L.; Han, T.; Li, B.; and Bai, L. 2025 a . Global tropical cyclone intensity forecasting with multi-modal multi-scale causal autoregressive model. In ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1--5. IEEE

  49. [49]

    Wang, X.; Liu, L.; Chen, K.; Han, T.; Li, B.; and Bai, L. 2025 b . VQLTI: Long-Term Tropical Cyclone Intensity Forecasting with Physical Constraints. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, 28476--28484

  50. [50]

    Wu, B.; Chen, W.; Wang, W.; Peng, B.; Sun, L.; and Chen, L. 2025. Weathergnn: Exploiting meteo-and spatial-dependencies for local numerical weather prediction bias-correction. In International Joint Conference on Artificial Intelligence

  51. [51]

    Wu, G.; Zhou, X.; Xu, X.; Huang, J.; Duan, A.; Yang, S.; Hu, W.; Ma, Y.; Liu, Y.; Bian, J.; et al. 2023. An integrated research plan for the Tibetan Plateau land--air coupled system and its impacts on the global climate. Bulletin of the American Meteorological Society, 104(1): E158--E177

  52. [52]

    Wu, H.; Weng, K.; Zhou, S.; Huang, X.; and Xiong, W. 2024 a . Neural manifold operators for learning the evolution of physical dynamics. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 3356--3366

  53. [53]

    Wu, H.; Xu, F.; Chen, C.; Hua, X.-S.; Luo, X.; and Wang, H. 2024 b . Pastnet: Introducing physical inductive biases for spatio-temporal video prediction. In Proceedings of the 32nd ACM international conference on multimedia, 2917--2926

  54. [54]

    Xiong, W.; Ma, M.; Huang, X.; Zhang, Z.; Sun, P.; and Tian, Y. 2023. Koopmanlab: machine learning for solving complex physics equations. APL Machine Learning, 1(3)

  55. [55]

    Xu, W.; Ling, F.; Zhang, W.; Han, T.; Chen, H.; Ouyang, W.; and Bai, L. 2024. Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling. In Advances in Neural Information Processing Systems

  56. [56]

    Ying, M.; Zhang, W.; Yu, H.; Lu, X.; Feng, J.; Fan, Y.; Zhu, Y.; and Chen, D. 2014. An overview of the China Meteorological Administration tropical cyclone database. Journal of Atmospheric and Oceanic Technology, 31(2): 287--301

  57. [57]

    Zhang, R.-H.; Tian, F.; and Wang, X. 2018. Ocean chlorophyll-induced heating feedbacks on ENSO in a coupled ocean physics--biology model forced by prescribed wind anomalies. Journal of Climate, 31(5): 1811--1832