A Hybrid LSTM--Vision Transformer Architecture for Predicting HRRR Forecast Errors
Pith reviewed 2026-06-26 21:20 UTC · model grok-4.3
The pith
A hybrid LSTM-Vision Transformer improves HRRR forecast error predictions by incorporating vertical atmospheric profiles from profilers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Incorporation of profiler-derived atmospheric structure improves forecast error prediction skill relative to the baseline LSTM architecture, with the largest gains occurring at shorter forecast lead times and during periods of enhanced PBL activity; for precipitation the LSTM-ViT framework achieves approximately a twofold increase in predictive skill while better capturing convectively driven error evolution.
What carries the argument
The hybrid LSTM-Vision Transformer that fuses temporal sequence learning from surface observations with vertically informed attention mechanisms applied to atmospheric profiles.
If this is right
- Forecast error prediction skill increases most at short lead times when vertical structure is supplied.
- Precipitation error forecasts show the largest relative gain and better track convective error sources.
- Degradation during enhanced PBL activity is reduced across temperature, wind, and precipitation predictions.
- The combined architecture supplies physically interpretable guidance on model bias for operational use.
Where Pith is reading between the lines
- The same profiler-augmented approach could be tested on other high-resolution NWP models that share similar PBL and convection error patterns.
- Attention weights might be inspected post-training to identify which vertical levels most influence error predictions during convective events.
- Extending the framework to additional surface variables or to regions with sparser profiler coverage would test whether the vertical information remains the dominant driver of gains.
Load-bearing premise
The observed skill gains result from the vertical attention mechanisms capturing PBL and convective processes rather than from added model capacity or dataset effects.
What would settle it
An experiment that matches total parameter count between the hybrid model and baseline LSTM but removes the profiler input, then measures whether the skill advantage disappears.
Figures
read the original abstract
Forecast errors in high-resolution numerical weather prediction (NWP) systems are often linked to unresolved planetary boundary layer (PBL) processes, convection, terrain-induced circulations, and other vertically structured atmospheric phenomena. Previous work demonstrated that Long Short-Term Memory (LSTM) networks can successfully predict forecast errors in the High-Resolution Rapid Refresh (HRRR) model using mesonet observations, but we believe performance degradation is linked to periods of complex vertical atmospheric evolution. To address this limitation, we develop a hybrid LSTM-Vision Transformer (LSTM-ViT) framework that combines temporal sequence learning from surface observations with atmospheric profiles from the New York State Mesonet profiler network. The LSTM-ViT framework is trained to predict HRRR hourly precipitation, 10 m wind speed, and 2 m temperature forecast errors at individual mesonet stations. Across all three predictors, incorporation of profiler-derived atmospheric structure improves forecast error prediction skill relative to the baseline LSTM architecture, with the largest gains occurring at shorter forecast lead times and during periods of enhanced PBL activity. Improvements are particularly pronounced for precipitation forecast error, where the LSTM-ViT framework achieves approximately a twofold increase in predictive skill relative to the baseline LSTM while better capturing convectively driven error evolution and reducing degradation associated with PBL processes. These results demonstrate that combining temporal sequence learning with vertically informed attention mechanisms provides a physically meaningful pathway for improving forecast error prediction in operational NWP systems. Our research offers forecasters enhanced guidance regarding model bias and forecast confidence.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a hybrid LSTM-Vision Transformer (LSTM-ViT) model that fuses temporal learning from surface mesonet observations with vertical atmospheric profiles from the New York State Mesonet profiler network to predict HRRR forecast errors in hourly precipitation, 10 m wind speed, and 2 m temperature. It reports that adding profiler-derived structure improves skill over a baseline LSTM across all three variables, with the largest gains at short lead times and during enhanced PBL activity; precipitation error prediction shows an approximately twofold skill increase while better capturing convective error evolution.
Significance. If the skill gains are shown to arise specifically from the vertically informed attention rather than capacity or data-volume effects, the work would offer a concrete, physically grounded route to reduce NWP error prediction degradation during complex PBL and convective regimes, with potential value for operational forecast guidance.
major comments (3)
- [Results / experimental design] The central attribution of the reported twofold precipitation skill gain and reduced PBL degradation to the LSTM-ViT's vertically informed attention (abstract and results) rests on a comparison solely to an untuned baseline LSTM; no parameter counts, FLOPs, or capacity-matched controls (e.g., deeper LSTM or LSTM with duplicated surface inputs) are described, leaving open the possibility that gains reflect increased model expressivity or input richness rather than the ViT mechanism.
- [Results] Post-hoc stratification on PBL-active periods (abstract) introduces selection dependence; without a pre-specified ablation that isolates the profiler profiles or ViT encoder on the full dataset, the mechanistic link between attention on vertical structure and the observed improvements cannot be isolated from dataset-specific effects.
- [Abstract / Results] No error bars, statistical significance tests, or explicit train-test split details are provided for the quantitative claims (abstract), weakening the reliability of the reported skill increases.
minor comments (1)
- [Methods] Hyperparameter tuning details and the exact definition of 'predictive skill' (e.g., which metric yields the twofold improvement) should be stated explicitly to allow reproduction.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important aspects of experimental design and statistical rigor. We address each major comment below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Results / experimental design] The central attribution of the reported twofold precipitation skill gain and reduced PBL degradation to the LSTM-ViT's vertically informed attention (abstract and results) rests on a comparison solely to an untuned baseline LSTM; no parameter counts, FLOPs, or capacity-matched controls (e.g., deeper LSTM or LSTM with duplicated surface inputs) are described, leaving open the possibility that gains reflect increased model expressivity or input richness rather than the ViT mechanism.
Authors: We agree that the current baseline comparison does not fully isolate the contribution of the vertically informed attention mechanism from potential effects of model capacity or input richness. In the revised manuscript we will report parameter counts and FLOPs for the LSTM-ViT and baseline LSTM, and we will add a capacity-matched control experiment (e.g., a deeper LSTM or an LSTM receiving duplicated surface inputs). These additions will allow a clearer attribution of skill gains to the ViT component. revision: yes
-
Referee: [Results] Post-hoc stratification on PBL-active periods (abstract) introduces selection dependence; without a pre-specified ablation that isolates the profiler profiles or ViT encoder on the full dataset, the mechanistic link between attention on vertical structure and the observed improvements cannot be isolated from dataset-specific effects.
Authors: We acknowledge that the PBL-active stratification was performed post-hoc. To address this, the revised manuscript will include a pre-specified ablation study performed on the full dataset that isolates the contribution of the profiler profiles and the ViT encoder. This will provide a more rigorous test of the mechanistic role of vertical structure. revision: yes
-
Referee: [Abstract / Results] No error bars, statistical significance tests, or explicit train-test split details are provided for the quantitative claims (abstract), weakening the reliability of the reported skill increases.
Authors: We agree that the absence of error bars, significance testing, and explicit train-test split information limits the strength of the quantitative claims. In the revision we will add error bars to all reported metrics, conduct appropriate statistical significance tests, and provide full details of the train-test split procedure. revision: yes
Circularity Check
No significant circularity; empirical ML training and held-out evaluation
full rationale
The paper trains LSTM and LSTM-ViT models on mesonet surface and profiler data to predict HRRR forecast errors for precipitation, wind, and temperature, then reports skill metrics on held-out data. The central results (skill gains, especially for precipitation at short leads and PBL-active periods) are obtained via standard supervised training and test-set evaluation rather than any derivation that reduces by construction to fitted parameters or self-citations. No equations, uniqueness theorems, or ansatzes are invoked that collapse the claimed improvements to the inputs; the comparison to baseline LSTM is an external empirical benchmark. Hyperparameter tuning on the dataset is standard practice and does not constitute circularity under the defined patterns.
Axiom & Free-Parameter Ledger
free parameters (1)
- LSTM and ViT architecture hyperparameters
axioms (1)
- domain assumption Profiler vertical profiles from the New York State Mesonet accurately capture the atmospheric structure that drives HRRR forecast errors at surface stations.
Reference graph
Works this paper leans on
-
[1]
Horton, 2023: New york state climate change projections methodology report
Bader, D., and R. Horton, 2023: New york state climate change projections methodology report. Technical report, new york state climate impacts assessment, Columbia University, Lamont-Doherty Earth Observatory, Columbia Climate School. Prepared for the New York State Climate Impacts Assessment
2023
-
[2]
Bishop, C. M., and H. Bishop, 2023: Deep Learning: Foundations and Concepts. 1st ed., Springer Cham, 649 pp., doi:https://doi.org/10.1007/978-3-031-45468-4, ://doi.org/10.1007/978-3-031-45468-4, 200 b/w illustrations, 400 illustrations in colour
-
[3]
Blaylock, B. K., J. D. Horel, and S. T. Liston, 2017: Cloud archiving and data mining of high-resolution rapid refresh forecast model output. Computers & Geosciences, 109, 43--50, doi:10.1016/j.cageo.2017.08.005
-
[4]
A., and Coauthors, 2020: A technical overview of the new york state mesonet standard network
Brotzge, J. A., and Coauthors, 2020: A technical overview of the new york state mesonet standard network. Journal of Atmospheric and Oceanic Technology, 37, 1827--1845, doi:10.1175/JTECH-D-19-0220.1
-
[5]
Campbell, L. S., and W. J. Steenburgh, 2017: The owles iop2b lake-effect snowstorm: Mechanisms contributing to the tug hill precipitation maximum. Monthly Weather Review, 145, 2461--2478, doi:10.1175/MWR-D-16-0460.1
-
[6]
Clare, M. C. A., M. Sonnewald, R. Lguensat, J. Deshayes, and V. Balaji, 2022: Explainable artificial intelligence for bayesian neural networks: Toward trustworthy predictions of ocean dynamics. Journal of Advances in Modeling Earth Systems, 14, e2022MS003\,162, doi:10.1029/2022MS003162
-
[7]
International Conference on Learning Representations (ICLR), ://openreview.net/forum?id=YicbFdNTTy
Dosovitskiy, A., and Coauthors, 2021: An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations (ICLR), ://openreview.net/forum?id=YicbFdNTTy
2021
-
[8]
Dowell, D. C., and Coauthors, 2022: The high-resolution rapid refresh (hrrr): An hourly updating convection-allowing forecast model. part i: Motivation and system description. Weather and Forecasting, 37, 1371--1395, doi:10.1175/WAF-D-21-0151.1
-
[9]
Evans, D. A., K. J. Sulia, N. P. Bassill, C. D. Thorncroft, J. C. Rothenberger, and L. C. Gaudet, 2025: Predicting forecast error for the hrrr using lstm neural networks: A comparative study using new york and oklahoma state mesonets. ://arxiv.org/abs/2512.14898, 2512.14898
Pith/arXiv arXiv 2025
-
[10]
Gagne, D. J., A. McGovern, S. E. Haupt, R. A. Sobash, J. K. Williams, and M. Xue, 2017: Storm-based probabilistic hail forecasting with machine learning applied to convection-allowing ensembles. Weather and Forecasting, 32, 1819--1840, doi:10.1175/WAF-D-17-0010.1
-
[11]
Gaudet, L. C., K. J. Sulia, R. D. Torn, and N. P. Bassill, 2024: Verification of the global forecast system, north american mesoscale forecast system, and high-resolution rapid refresh model near-surface forecasts by use of the new york state mesonet. Weather and Forecasting, 39, 369--386, doi:10.1175/WAF-D-23-0094.1
-
[12]
Hochreiter, S., and J. Schmidhuber, 1997: Long short-term memory. Neural Computation, 9, 1735--1780, doi:10.1162/neco.1997.9.8.1735
-
[13]
James, E. P., and Coauthors, 2022: The high-resolution rapid refresh (hrrr): An hourly updating convection-allowing forecast model. part ii: Forecast performance. Weather and Forecasting, 37 (8), 1397--1417, doi:10.1175/waf-d-21-0130.1
-
[14]
Learning skillful medium-range global weather forecasting
Lam, R., and Coauthors, 2023: Learning skillful medium-range global weather forecasting. Science, 382 (6677), 1416--1421, doi:10.1126/science.adi2336
-
[15]
Lang, S., and Coauthors, 2024: Aifs -- ecmwf's data-driven forecasting system. ArXiv, 2406.01465
arXiv 2024
-
[16]
Bulletin of the American Meteorological Society, 98, 1349--1361, doi:10.1175/BAMS-D-15-00258.1
Mahmood, R., and Coauthors, 2017: Mesonets: Mesoscale weather and climate observations for the united states. Bulletin of the American Meteorological Society, 98, 1349--1361, doi:10.1175/BAMS-D-15-00258.1
-
[17]
Christensen, 2026: Epistemic and aleatoric uncertainty quantification in weather and climate models
Mansfield, L., and H. Christensen, 2026: Epistemic and aleatoric uncertainty quantification in weather and climate models. Quarterly Journal of the Royal Meteorological Society, doi:10.1002/qj.70219
-
[18]
McGovern, A., K. L. Elmore, D. J. Gagne, S. E. Haupt, C. D. Karstens, R. Lagerquist, T. Smith, and J. K. Williams, 2017: Using artificial intelligence to improve real-time decision-making for high-impact weather. Bulletin of the American Meteorological Society, 98, 2073--2090, doi:10.1175/BAMS-D-16-0123.1
-
[19]
Molod, A., H. Salmun, and A. B. Marquardt Collow, 2019: Annual cycle of planetary boundary layer heights estimated from wind profiler network data. Journal of Geophysical Research: Atmospheres, 124 (12), 6207--6221, doi:10.1029/2018JD030102
-
[20]
://rapidrefresh.noaa.gov/hrrr/, accessed: 1 Apr
National Centers for Environmental Prediction , 2024: High-resolution rapid refresh (hrrr) model. ://rapidrefresh.noaa.gov/hrrr/, accessed: 1 Apr. 2025
2024
-
[21]
climate divisions
NCEI , 2015: U.s. climate divisions. Accessed: 2023-08-03, https://www.ncei.noaa.gov/access/monitoring/dyk/us-climate-divisions
2015
-
[22]
Accessed: 2025-12-09, https://madis.ncep.noaa.gov/mesonet_providers.shtml
NOAA/NCEP MADIS , 2021: Madis meteorological surface data providers. Accessed: 2025-12-09, https://madis.ncep.noaa.gov/mesonet_providers.shtml
2021
-
[23]
Stephan Rasp, Stephan Hoyer, Aravind Merose, Johannes Langguth, Sebastian Deiser, et al
Rasp, S., and Coauthors, 2024: Weatherbench 2: A benchmark for the next generation of data-driven global weather models. Journal of Advances in Modeling Earth Systems, 16 (6), e2023MS004\,019, doi:10.1029/2023MS004019
-
[24]
Shrestha, B., J. A. Brotzge, and J. Wang, 2022: Evaluation of the new york state mesonet profiler network data. Atmospheric Measurement Techniques, 15, 6011--6033, doi:10.5194/amt-15-6011-2022
-
[25]
Shrestha, B., J. A. Brotzge, J. Wang, N. Bain, C. D. Thorncroft, E. Joseph, J. Freedman, and S. Perez, 2021: Overview and applications of the new york state mesonet profiler network. Journal of Applied Meteorology and Climatology, 60, 1591--1611, doi:10.1175/JAMC-D-21-0104.1
-
[26]
Swain, M., J. C. Peña, R. Bornstein, and J. Gonzalez, 2025: Coastal and anthropogenic heat impacts on pbl processes during extreme summer thunderstorm precipitation in new york city. Urban Climate, 62, doi:10.1016/j.uclim.2025.102534
-
[27]
Tang, S., C. Li, P. Zhang, and R. Tang, 2023: Swinlstm: Improving spatiotemporal prediction accuracy using swin transformer and lstm. 13424-13433 pp., doi:10.1109/ICCV51070.2023.01239
-
[28]
Shazeer, N
Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, 2017: Attention is all you need. Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., Curran Associates, Inc., Vol. 30, ://proceedings.neurips.cc/paper_fil...
2017
-
[29]
Journal of Energy Research and Reviews, 17 (6), 71--87, doi:10.9734/jenrr/2025/v17i6423
Zhang, Y., 2025: Application of lstm and transformer hybrid model for electricity consumption forecasting. Journal of Energy Research and Reviews, 17 (6), 71--87, doi:10.9734/jenrr/2025/v17i6423
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.