pith. sign in

arxiv: 2606.19026 · v1 · pith:LFFAVRV3new · submitted 2026-06-17 · 💻 cs.LG · cs.AI· physics.ao-ph

A Hybrid LSTM--Vision Transformer Architecture for Predicting HRRR Forecast Errors

Pith reviewed 2026-06-26 21:20 UTC · model grok-4.3

classification 💻 cs.LG cs.AIphysics.ao-ph
keywords forecast error predictionHRRRLSTMVision Transformerplanetary boundary layerprecipitation forecastmesonet profilerhybrid architecture
0
0 comments X

The pith

A hybrid LSTM-Vision Transformer improves HRRR forecast error predictions by incorporating vertical atmospheric profiles from profilers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a hybrid LSTM-Vision Transformer that combines temporal learning from surface mesonet observations with vertical profiles to predict errors in HRRR forecasts of precipitation, 10 m wind speed, and 2 m temperature. Adding the profiler data raises skill over a pure LSTM baseline for all three variables, with the largest gains at shorter lead times and during enhanced PBL activity. The improvement reaches roughly twofold for precipitation error prediction and reduces degradation tied to convective processes. The work shows that vertically informed attention supplies a route to better error forecasts in high-resolution NWP.

Core claim

Incorporation of profiler-derived atmospheric structure improves forecast error prediction skill relative to the baseline LSTM architecture, with the largest gains occurring at shorter forecast lead times and during periods of enhanced PBL activity; for precipitation the LSTM-ViT framework achieves approximately a twofold increase in predictive skill while better capturing convectively driven error evolution.

What carries the argument

The hybrid LSTM-Vision Transformer that fuses temporal sequence learning from surface observations with vertically informed attention mechanisms applied to atmospheric profiles.

If this is right

  • Forecast error prediction skill increases most at short lead times when vertical structure is supplied.
  • Precipitation error forecasts show the largest relative gain and better track convective error sources.
  • Degradation during enhanced PBL activity is reduced across temperature, wind, and precipitation predictions.
  • The combined architecture supplies physically interpretable guidance on model bias for operational use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same profiler-augmented approach could be tested on other high-resolution NWP models that share similar PBL and convection error patterns.
  • Attention weights might be inspected post-training to identify which vertical levels most influence error predictions during convective events.
  • Extending the framework to additional surface variables or to regions with sparser profiler coverage would test whether the vertical information remains the dominant driver of gains.

Load-bearing premise

The observed skill gains result from the vertical attention mechanisms capturing PBL and convective processes rather than from added model capacity or dataset effects.

What would settle it

An experiment that matches total parameter count between the hybrid model and baseline LSTM but removes the profiler input, then measures whether the skill advantage disappears.

Figures

Figures reproduced from arXiv: 2606.19026 by Chris D. Thorncroft, David Aaron Evans, Jay C. Rothenberger, Kara J. Sulia, Nick P. Bassill.

Figure 1
Figure 1. Figure 1: New York State overlaid with panels corresponding to NCEI climate [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: This graphic illustrates the LSTM+ViT encoder–decoder workflow at a [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The structure of the Vision Transformer (ViT) encoder unit as imple [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Scatterplot of the precipitation error across the NYSM network and all [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Although the LSTM-ViT model more effectively captures both positive and [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 5
Figure 5. Figure 5: Confusion matrix summarizing the precision of Hybrid predictions for [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: New York State MAE overlaid by NCEI climate division (NCEI, 2015). [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: From top to bottom, panels show aggregate RMSE in mmhr [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: NYSM, MAE of LSTM-ViT precipitation-error predictions in mmhr [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Scatterplot of the wind error across the NYSM network and all forecast [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: New York State MAE overlaid by NCEI climate division (NCEI, 2015). [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: From top to bottom, panels show aggregate RMSE in m s [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: NYSM, MAE of LSTM-ViT wind-error predictions in m s [PITH_FULL_IMAGE:figures/full_fig_p027_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Scatterplot of the temperature error across the NYSM network and [PITH_FULL_IMAGE:figures/full_fig_p030_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: New York State MAE overlaid by NCEI climate division (NCEI, 2015). [PITH_FULL_IMAGE:figures/full_fig_p031_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: From top to bottom, panels show aggregate RMSE in [PITH_FULL_IMAGE:figures/full_fig_p032_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: NYSM, MAE of LSTM-ViT temperature-error predictions in [PITH_FULL_IMAGE:figures/full_fig_p033_16.png] view at source ↗
read the original abstract

Forecast errors in high-resolution numerical weather prediction (NWP) systems are often linked to unresolved planetary boundary layer (PBL) processes, convection, terrain-induced circulations, and other vertically structured atmospheric phenomena. Previous work demonstrated that Long Short-Term Memory (LSTM) networks can successfully predict forecast errors in the High-Resolution Rapid Refresh (HRRR) model using mesonet observations, but we believe performance degradation is linked to periods of complex vertical atmospheric evolution. To address this limitation, we develop a hybrid LSTM-Vision Transformer (LSTM-ViT) framework that combines temporal sequence learning from surface observations with atmospheric profiles from the New York State Mesonet profiler network. The LSTM-ViT framework is trained to predict HRRR hourly precipitation, 10 m wind speed, and 2 m temperature forecast errors at individual mesonet stations. Across all three predictors, incorporation of profiler-derived atmospheric structure improves forecast error prediction skill relative to the baseline LSTM architecture, with the largest gains occurring at shorter forecast lead times and during periods of enhanced PBL activity. Improvements are particularly pronounced for precipitation forecast error, where the LSTM-ViT framework achieves approximately a twofold increase in predictive skill relative to the baseline LSTM while better capturing convectively driven error evolution and reducing degradation associated with PBL processes. These results demonstrate that combining temporal sequence learning with vertically informed attention mechanisms provides a physically meaningful pathway for improving forecast error prediction in operational NWP systems. Our research offers forecasters enhanced guidance regarding model bias and forecast confidence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces a hybrid LSTM-Vision Transformer (LSTM-ViT) model that fuses temporal learning from surface mesonet observations with vertical atmospheric profiles from the New York State Mesonet profiler network to predict HRRR forecast errors in hourly precipitation, 10 m wind speed, and 2 m temperature. It reports that adding profiler-derived structure improves skill over a baseline LSTM across all three variables, with the largest gains at short lead times and during enhanced PBL activity; precipitation error prediction shows an approximately twofold skill increase while better capturing convective error evolution.

Significance. If the skill gains are shown to arise specifically from the vertically informed attention rather than capacity or data-volume effects, the work would offer a concrete, physically grounded route to reduce NWP error prediction degradation during complex PBL and convective regimes, with potential value for operational forecast guidance.

major comments (3)
  1. [Results / experimental design] The central attribution of the reported twofold precipitation skill gain and reduced PBL degradation to the LSTM-ViT's vertically informed attention (abstract and results) rests on a comparison solely to an untuned baseline LSTM; no parameter counts, FLOPs, or capacity-matched controls (e.g., deeper LSTM or LSTM with duplicated surface inputs) are described, leaving open the possibility that gains reflect increased model expressivity or input richness rather than the ViT mechanism.
  2. [Results] Post-hoc stratification on PBL-active periods (abstract) introduces selection dependence; without a pre-specified ablation that isolates the profiler profiles or ViT encoder on the full dataset, the mechanistic link between attention on vertical structure and the observed improvements cannot be isolated from dataset-specific effects.
  3. [Abstract / Results] No error bars, statistical significance tests, or explicit train-test split details are provided for the quantitative claims (abstract), weakening the reliability of the reported skill increases.
minor comments (1)
  1. [Methods] Hyperparameter tuning details and the exact definition of 'predictive skill' (e.g., which metric yields the twofold improvement) should be stated explicitly to allow reproduction.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects of experimental design and statistical rigor. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Results / experimental design] The central attribution of the reported twofold precipitation skill gain and reduced PBL degradation to the LSTM-ViT's vertically informed attention (abstract and results) rests on a comparison solely to an untuned baseline LSTM; no parameter counts, FLOPs, or capacity-matched controls (e.g., deeper LSTM or LSTM with duplicated surface inputs) are described, leaving open the possibility that gains reflect increased model expressivity or input richness rather than the ViT mechanism.

    Authors: We agree that the current baseline comparison does not fully isolate the contribution of the vertically informed attention mechanism from potential effects of model capacity or input richness. In the revised manuscript we will report parameter counts and FLOPs for the LSTM-ViT and baseline LSTM, and we will add a capacity-matched control experiment (e.g., a deeper LSTM or an LSTM receiving duplicated surface inputs). These additions will allow a clearer attribution of skill gains to the ViT component. revision: yes

  2. Referee: [Results] Post-hoc stratification on PBL-active periods (abstract) introduces selection dependence; without a pre-specified ablation that isolates the profiler profiles or ViT encoder on the full dataset, the mechanistic link between attention on vertical structure and the observed improvements cannot be isolated from dataset-specific effects.

    Authors: We acknowledge that the PBL-active stratification was performed post-hoc. To address this, the revised manuscript will include a pre-specified ablation study performed on the full dataset that isolates the contribution of the profiler profiles and the ViT encoder. This will provide a more rigorous test of the mechanistic role of vertical structure. revision: yes

  3. Referee: [Abstract / Results] No error bars, statistical significance tests, or explicit train-test split details are provided for the quantitative claims (abstract), weakening the reliability of the reported skill increases.

    Authors: We agree that the absence of error bars, significance testing, and explicit train-test split information limits the strength of the quantitative claims. In the revision we will add error bars to all reported metrics, conduct appropriate statistical significance tests, and provide full details of the train-test split procedure. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical ML training and held-out evaluation

full rationale

The paper trains LSTM and LSTM-ViT models on mesonet surface and profiler data to predict HRRR forecast errors for precipitation, wind, and temperature, then reports skill metrics on held-out data. The central results (skill gains, especially for precipitation at short leads and PBL-active periods) are obtained via standard supervised training and test-set evaluation rather than any derivation that reduces by construction to fitted parameters or self-citations. No equations, uniqueness theorems, or ansatzes are invoked that collapse the claimed improvements to the inputs; the comparison to baseline LSTM is an external empirical benchmark. Hyperparameter tuning on the dataset is standard practice and does not constitute circularity under the defined patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on standard supervised learning assumptions plus one domain assumption about profiler data; no new physical entities are postulated and the number of explicit free parameters beyond routine hyperparameters is low.

free parameters (1)
  • LSTM and ViT architecture hyperparameters
    Model depth, attention heads, hidden sizes, and training hyperparameters are selected and optimized on the training data to produce the reported skill gains.
axioms (1)
  • domain assumption Profiler vertical profiles from the New York State Mesonet accurately capture the atmospheric structure that drives HRRR forecast errors at surface stations.
    Invoked when attributing skill gains to incorporation of profiler-derived structure and when linking gains to PBL activity.

pith-pipeline@v0.9.1-grok · 5820 in / 1384 out tokens · 26731 ms · 2026-06-26T21:20:44.690380+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 21 canonical work pages

  1. [1]

    Horton, 2023: New york state climate change projections methodology report

    Bader, D., and R. Horton, 2023: New york state climate change projections methodology report. Technical report, new york state climate impacts assessment, Columbia University, Lamont-Doherty Earth Observatory, Columbia Climate School. Prepared for the New York State Climate Impacts Assessment

  2. [2]

    M., & Bishop, H

    Bishop, C. M., and H. Bishop, 2023: Deep Learning: Foundations and Concepts. 1st ed., Springer Cham, 649 pp., doi:https://doi.org/10.1007/978-3-031-45468-4, ://doi.org/10.1007/978-3-031-45468-4, 200 b/w illustrations, 400 illustrations in colour

  3. [3]

    Blaylock, B. K., J. D. Horel, and S. T. Liston, 2017: Cloud archiving and data mining of high-resolution rapid refresh forecast model output. Computers & Geosciences, 109, 43--50, doi:10.1016/j.cageo.2017.08.005

  4. [4]

    A., and Coauthors, 2020: A technical overview of the new york state mesonet standard network

    Brotzge, J. A., and Coauthors, 2020: A technical overview of the new york state mesonet standard network. Journal of Atmospheric and Oceanic Technology, 37, 1827--1845, doi:10.1175/JTECH-D-19-0220.1

  5. [5]

    S., and W

    Campbell, L. S., and W. J. Steenburgh, 2017: The owles iop2b lake-effect snowstorm: Mechanisms contributing to the tug hill precipitation maximum. Monthly Weather Review, 145, 2461--2478, doi:10.1175/MWR-D-16-0460.1

  6. [6]

    Clare, M. C. A., M. Sonnewald, R. Lguensat, J. Deshayes, and V. Balaji, 2022: Explainable artificial intelligence for bayesian neural networks: Toward trustworthy predictions of ocean dynamics. Journal of Advances in Modeling Earth Systems, 14, e2022MS003\,162, doi:10.1029/2022MS003162

  7. [7]

    International Conference on Learning Representations (ICLR), ://openreview.net/forum?id=YicbFdNTTy

    Dosovitskiy, A., and Coauthors, 2021: An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations (ICLR), ://openreview.net/forum?id=YicbFdNTTy

  8. [8]

    C., and Coauthors, 2022: The high-resolution rapid refresh (hrrr): An hourly updating convection-allowing forecast model

    Dowell, D. C., and Coauthors, 2022: The high-resolution rapid refresh (hrrr): An hourly updating convection-allowing forecast model. part i: Motivation and system description. Weather and Forecasting, 37, 1371--1395, doi:10.1175/WAF-D-21-0151.1

  9. [9]

    Evans, D. A., K. J. Sulia, N. P. Bassill, C. D. Thorncroft, J. C. Rothenberger, and L. C. Gaudet, 2025: Predicting forecast error for the hrrr using lstm neural networks: A comparative study using new york and oklahoma state mesonets. ://arxiv.org/abs/2512.14898, 2512.14898

  10. [10]

    Gagne, D. J., A. McGovern, S. E. Haupt, R. A. Sobash, J. K. Williams, and M. Xue, 2017: Storm-based probabilistic hail forecasting with machine learning applied to convection-allowing ensembles. Weather and Forecasting, 32, 1819--1840, doi:10.1175/WAF-D-17-0010.1

  11. [11]

    Gaudet, L. C., K. J. Sulia, R. D. Torn, and N. P. Bassill, 2024: Verification of the global forecast system, north american mesoscale forecast system, and high-resolution rapid refresh model near-surface forecasts by use of the new york state mesonet. Weather and Forecasting, 39, 369--386, doi:10.1175/WAF-D-23-0094.1

  12. [12]

    Long short-term memory

    Hochreiter, S., and J. Schmidhuber, 1997: Long short-term memory. Neural Computation, 9, 1735--1780, doi:10.1162/neco.1997.9.8.1735

  13. [13]

    P., and Coauthors, 2022: The high-resolution rapid refresh (hrrr): An hourly updating convection-allowing forecast model

    James, E. P., and Coauthors, 2022: The high-resolution rapid refresh (hrrr): An hourly updating convection-allowing forecast model. part ii: Forecast performance. Weather and Forecasting, 37 (8), 1397--1417, doi:10.1175/waf-d-21-0130.1

  14. [14]

    Learning skillful medium-range global weather forecasting

    Lam, R., and Coauthors, 2023: Learning skillful medium-range global weather forecasting. Science, 382 (6677), 1416--1421, doi:10.1126/science.adi2336

  15. [15]

    ArXiv, 2406.01465

    Lang, S., and Coauthors, 2024: Aifs -- ecmwf's data-driven forecasting system. ArXiv, 2406.01465

  16. [16]

    Bulletin of the American Meteorological Society, 98, 1349--1361, doi:10.1175/BAMS-D-15-00258.1

    Mahmood, R., and Coauthors, 2017: Mesonets: Mesoscale weather and climate observations for the united states. Bulletin of the American Meteorological Society, 98, 1349--1361, doi:10.1175/BAMS-D-15-00258.1

  17. [17]

    Christensen, 2026: Epistemic and aleatoric uncertainty quantification in weather and climate models

    Mansfield, L., and H. Christensen, 2026: Epistemic and aleatoric uncertainty quantification in weather and climate models. Quarterly Journal of the Royal Meteorological Society, doi:10.1002/qj.70219

  18. [18]

    McGovern, A., K. L. Elmore, D. J. Gagne, S. E. Haupt, C. D. Karstens, R. Lagerquist, T. Smith, and J. K. Williams, 2017: Using artificial intelligence to improve real-time decision-making for high-impact weather. Bulletin of the American Meteorological Society, 98, 2073--2090, doi:10.1175/BAMS-D-16-0123.1

  19. [19]

    Salmun, and A

    Molod, A., H. Salmun, and A. B. Marquardt Collow, 2019: Annual cycle of planetary boundary layer heights estimated from wind profiler network data. Journal of Geophysical Research: Atmospheres, 124 (12), 6207--6221, doi:10.1029/2018JD030102

  20. [20]

    ://rapidrefresh.noaa.gov/hrrr/, accessed: 1 Apr

    National Centers for Environmental Prediction , 2024: High-resolution rapid refresh (hrrr) model. ://rapidrefresh.noaa.gov/hrrr/, accessed: 1 Apr. 2025

  21. [21]

    climate divisions

    NCEI , 2015: U.s. climate divisions. Accessed: 2023-08-03, https://www.ncei.noaa.gov/access/monitoring/dyk/us-climate-divisions

  22. [22]

    Accessed: 2025-12-09, https://madis.ncep.noaa.gov/mesonet_providers.shtml

    NOAA/NCEP MADIS , 2021: Madis meteorological surface data providers. Accessed: 2025-12-09, https://madis.ncep.noaa.gov/mesonet_providers.shtml

  23. [23]

    Stephan Rasp, Stephan Hoyer, Aravind Merose, Johannes Langguth, Sebastian Deiser, et al

    Rasp, S., and Coauthors, 2024: Weatherbench 2: A benchmark for the next generation of data-driven global weather models. Journal of Advances in Modeling Earth Systems, 16 (6), e2023MS004\,019, doi:10.1029/2023MS004019

  24. [24]

    Shrestha, B., J. A. Brotzge, and J. Wang, 2022: Evaluation of the new york state mesonet profiler network data. Atmospheric Measurement Techniques, 15, 6011--6033, doi:10.5194/amt-15-6011-2022

  25. [25]

    Shrestha, B., J. A. Brotzge, J. Wang, N. Bain, C. D. Thorncroft, E. Joseph, J. Freedman, and S. Perez, 2021: Overview and applications of the new york state mesonet profiler network. Journal of Applied Meteorology and Climatology, 60, 1591--1611, doi:10.1175/JAMC-D-21-0104.1

  26. [26]

    Swain, M., J. C. Peña, R. Bornstein, and J. Gonzalez, 2025: Coastal and anthropogenic heat impacts on pbl processes during extreme summer thunderstorm precipitation in new york city. Urban Climate, 62, doi:10.1016/j.uclim.2025.102534

  27. [27]

    Tang, S., C. Li, P. Zhang, and R. Tang, 2023: Swinlstm: Improving spatiotemporal prediction accuracy using swin transformer and lstm. 13424-13433 pp., doi:10.1109/ICCV51070.2023.01239

  28. [28]

    Shazeer, N

    Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, 2017: Attention is all you need. Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., Curran Associates, Inc., Vol. 30, ://proceedings.neurips.cc/paper_fil...

  29. [29]

    Journal of Energy Research and Reviews, 17 (6), 71--87, doi:10.9734/jenrr/2025/v17i6423

    Zhang, Y., 2025: Application of lstm and transformer hybrid model for electricity consumption forecasting. Journal of Energy Research and Reviews, 17 (6), 71--87, doi:10.9734/jenrr/2025/v17i6423