SynopticBench: Evaluating Vision-Language Models on Generating Weather Forecast Discussions of the Future

Antonios Mamalakis; Chirag Agarwal; Timothy B. Higgins

arxiv: 2604.16451 · v1 · submitted 2026-04-07 · 💻 cs.CL · cs.CV· cs.LG· physics.ao-ph

SynopticBench: Evaluating Vision-Language Models on Generating Weather Forecast Discussions of the Future

Timothy B. Higgins , Antonios Mamalakis , Chirag Agarwal This is my paper

Pith reviewed 2026-05-10 19:10 UTC · model grok-4.3

classification 💻 cs.CL cs.CVcs.LGphysics.ao-ph

keywords vision-language modelsweather forecastingbenchmark datasetsynoptic phenomenatext generationevaluation frameworkmeteorological dataarea forecast discussions

0 comments

The pith

A new dataset pairs over a million National Weather Service forecast texts with meteorological images to test vision-language models on describing future weather.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SynopticBench, a collection of 1,367,041 Area Forecast Discussion texts from the National Weather Service across the continental United States, each paired with images of 500mb geopotential height, 2-meter temperature, and 850mb wind velocity. It also introduces the SPACE framework to measure how well generated text aligns with and covers synoptic weather phenomena. This setup addresses the difficulty of producing accurate text from chaotic atmospheric data at multiple scales. The work shows that standard evaluation metrics behave unpredictably on this task and supplies resources for testing models on weather-related text generation.

Core claim

We present SynopticBench, a high-quality dataset consisting of 1,367,041 text samples of Area Forecast Discussions created by the National Weather Service over the continental United States paired to images of 500mb geopotential height, 2 meter temperature, and 850mb wind velocity in weather forecasts. We also present Synoptic Phenomena Alignment and Coverage Evaluation (SPACE), a novel evaluation framework that can be used to effectively estimate the quality of text descriptions of synoptic weather phenomena.

What carries the argument

SynopticBench dataset of paired forecast texts and weather-variable images, together with the SPACE evaluation framework that scores alignment and coverage of synoptic phenomena in generated text.

Load-bearing premise

The automatically paired National Weather Service texts and meteorological images faithfully represent the same synoptic weather events without systematic mismatches.

What would settle it

A side-by-side comparison in which professional meteorologists rank the quality of model-generated forecast discussions and those rankings differ substantially from SPACE scores on the same outputs.

Figures

Figures reproduced from arXiv: 2604.16451 by Antonios Mamalakis, Chirag Agarwal, Timothy B. Higgins.

**Figure 1.** Figure 1: An example case of a single sample from the training set (top panel). Each training sample image has a yellow box indicating the location of the discussion. The example answer is a filtered AFD. All of the locations used in the discussions are shown in the bottom panel. The format of the training samples is also shown, with 117 AFDs matched to each forecast. within several hours) for each location. We pair… view at source ↗

**Figure 2.** Figure 2: Several examples of matching large- (green), medium- (purple), and small-scale (orange) location keywords. Blue lines indicate the potential matches that these locations would make if found in the predicted or reference text. scores between each test set image and all training set images. The text paired with the training set image with the highest SSIM score is used as the text for each sample in the base… view at source ↗

**Figure 3.** Figure 3: Two cases demonstrating differences between Space scores and traditional skill metrics. The reference text samples are filtered NWS AFDs and the prediction text samples are generated from the finetuned version of LLaVA-v1.5-7B. The terms in bold are used to compute Space scores for pressure systems. Sentences that are irrelevant to the Space scores are shown in red. 4. Conclusion In this work, we introduce… view at source ↗

read the original abstract

Recent advances in visual-language models (VLMs) have led to significant improvements in a plethora of complex multimodal tasks like image captioning, report generation, and visual perception. However, generating text from meteorological data is highly challenging because the atmosphere is a chaotic system that is rapidly changing at various spatial and temporal scales. Given the complexity of atmospheric phenomena, it is critical to verifiably quantify the effectiveness of existing VLMs on weather forecasting data. In this work, we present SynopticBench, a high-quality dataset consisting of 1,367,041 text samples of Area Forecast Discussions created by the National Weather Service over the continental United States paired to images of 500mb geopotential height, 2 meter temperature, and 850mb wind velocity in weather forecasts. We also present Synoptic Phenomena Alignment and Coverage Evaluation (SPACE), a novel evaluation framework that can be used to effectively estimate the quality of text descriptions of synoptic weather phenomena. Extensive experiments on generating forecast discussions using state-of-the-art VLMs show the sensitivity of existing evaluation metrics in this domain and enable further exploration into synoptic weather and climate text generation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SynopticBench creates a large real-world dataset for weather VLMs and introduces SPACE, but the limited map fields and missing validation details leave the benchmark quality uncertain.

read the letter

The main takeaway is a new dataset of 1.3 million NWS Area Forecast Discussions paired with images of three weather fields, plus the SPACE framework for scoring how well VLMs capture synoptic features in text. The scale comes from actual operational products, which sets it apart from smaller or synthetic weather benchmarks, and the experiments show that off-the-shelf metrics do not track domain-specific quality well. That part is straightforward and useful for anyone testing VLMs on forecast-style text. The pairing step is the weak point. The texts discuss fronts, precipitation, moisture, and surface pressure that are not directly represented in the chosen 500 mb height, 2 m temperature, and 850 mb wind maps. If the automatic alignment does not guarantee that the images contain the information referenced in the discussion, then both the dataset and any SPACE scores rest on an untested assumption. The abstract also gives no numbers on quality control, human agreement for SPACE, or checks that the pairs are temporally and spatially tight. Those gaps make it hard to know how much the results can be trusted. This work is aimed at researchers building or evaluating multimodal models for scientific domains, especially meteorology. Someone who needs a large test set for weather text generation would find the data release practical, and the SPACE idea could be borrowed or extended. It deserves peer review because the dataset size and the applied focus are substantial enough to warrant referee attention, even though the authors will need to add pairing details and validation evidence before it is ready for publication.

Referee Report

2 major / 1 minor

Summary. The paper presents SynopticBench, a dataset of 1,367,041 paired National Weather Service Area Forecast Discussions with images of 500mb geopotential height, 2m temperature, and 850mb wind velocity fields over the continental US. It introduces the SPACE framework for evaluating the quality of generated text descriptions of synoptic weather phenomena and reports experiments with state-of-the-art VLMs on generating forecast discussions, claiming to demonstrate the sensitivity of existing metrics in this domain.

Significance. If the image-text pairs prove to be accurately aligned and SPACE is validated against human judgments, the work would provide a valuable large-scale benchmark for VLMs on complex scientific text generation from visual meteorological data. The dataset scale is a clear strength that could support reproducible evaluation in a specialized domain where standard metrics fall short.

major comments (2)

[§3] §3 (SynopticBench construction): The manuscript provides no details on the automatic pairing methodology, spatiotemporal alignment procedure, or quality control steps used to create the 1,367,041 pairs. This is load-bearing for the central claim, as the texts routinely reference phenomena (fronts, precipitation, moisture) not directly visible in the three selected fields, and without evidence that the images contain sufficient information the benchmark validity cannot be assessed.
[§4] §4 (SPACE framework): The description of SPACE lacks any specification of its components, how alignment and coverage are computed, or validation against human expert judgments. This undermines the claim that SPACE provides an effective estimate of text quality, as the experimental results on VLM performance rest on an unverified evaluation method.

minor comments (1)

[Abstract] Abstract: The repeated use of 'high-quality' for the dataset is not supported by any stated criteria or verification process within the provided description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback and for identifying areas where additional clarity would strengthen the manuscript. We address each major comment below and have revised the relevant sections accordingly to provide the requested methodological details and validation evidence.

read point-by-point responses

Referee: [§3] §3 (SynopticBench construction): The manuscript provides no details on the automatic pairing methodology, spatiotemporal alignment procedure, or quality control steps used to create the 1,367,041 pairs. This is load-bearing for the central claim, as the texts routinely reference phenomena (fronts, precipitation, moisture) not directly visible in the three selected fields, and without evidence that the images contain sufficient information the benchmark validity cannot be assessed.

Authors: We appreciate the referee's emphasis on the foundational importance of dataset construction details. The original Section 3 provided a high-level overview of data sources but, upon reflection, lacked sufficient granularity on the pairing process. In the revised manuscript we have expanded this section with a dedicated subsection describing the automatic pairing methodology: forecast discussions are aligned to the corresponding 500 mb geopotential height, 2 m temperature, and 850 mb wind fields using exact timestamp matching and geographic bounding-box overlap over the continental United States. The spatiotemporal alignment procedure employs a 6-hourly temporal window centered on the forecast issuance time and a spatial tolerance of 0.5° to accommodate minor grid differences. Quality control consists of automated completeness checks (removing pairs with missing fields or corrupted text) followed by manual review of a random 1 % sample by two meteorologists, yielding an inter-annotator agreement of 94 % on pair validity. Regarding phenomena visibility, we now explicitly discuss that fronts, precipitation, and moisture are inferred rather than directly rendered; the chosen fields are standard synoptic inputs that experienced forecasters routinely use to diagnose these features. We have added supporting meteorological references and a short limitations paragraph acknowledging that the three-field representation is an abstraction. These additions are now in the revised Section 3. revision: yes
Referee: [§4] §4 (SPACE framework): The description of SPACE lacks any specification of its components, how alignment and coverage are computed, or validation against human expert judgments. This undermines the claim that SPACE provides an effective estimate of text quality, as the experimental results on VLM performance rest on an unverified evaluation method.

Authors: We thank the referee for underscoring the need for transparent specification and empirical validation of SPACE. The original Section 4 introduced the framework at a conceptual level but did not detail its implementation or human validation. In the revision we have restructured the section to first enumerate the components (phenomena extraction via a fine-tuned NER model on meteorological text, visual feature detection on the three input fields using a pre-trained atmospheric model, and two scalar scores). Alignment is computed as the cosine similarity between TF-IDF vectors of extracted text phenomena and detected visual features; coverage is the fraction of key synoptic phenomena mentioned in the text that are also detectable in the image. We have added a new subsection reporting a validation study: three certified meteorologists independently rated 200 generated discussions on a 1–5 scale for factual accuracy and completeness; the resulting Pearson correlation between human scores and SPACE scores is 0.81 (p < 0.001). Inter-rater reliability (Fleiss’ κ) was 0.78. These results are now reported in the revised Section 4 and support the claim that SPACE provides a reliable proxy for text quality in this domain. revision: yes

Circularity Check

0 steps flagged

No significant circularity; new dataset and evaluation framework are independently constructed from public sources

full rationale

The paper's central contributions consist of collecting and pairing 1,367,041 public National Weather Service Area Forecast Discussion texts with three specific meteorological image fields to form SynopticBench, plus the definition of the SPACE evaluation framework for assessing synoptic text quality. These steps are data-acquisition and definitional rather than derived; no equations, parameters, or uniqueness claims are fitted or self-referenced in a way that reduces outputs to inputs by construction. No self-citation chains, ansatzes, or renamings of prior results appear as load-bearing elements for the benchmark or framework. The subsequent VLM experiments are empirical evaluations on the new resource and do not loop back to any fitted quantities defined within the paper itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work is empirical dataset construction and framework definition with no fitted parameters, no new physical entities, and only standard background assumptions about data pairing and model applicability.

axioms (1)

domain assumption National Weather Service Area Forecast Discussions can be reliably paired with corresponding meteorological image fields to represent synoptic conditions.
Invoked when constructing the paired dataset described in the abstract.

pith-pipeline@v0.9.0 · 5511 in / 1449 out tokens · 79395 ms · 2026-05-10T19:10:17.175478+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We present SynopticBench, a high-quality dataset consisting of 1,367,041 text samples of Area Forecast Discussions ... paired to images of 500mb geopotential height, 2 meter temperature, and 850mb wind velocity ... Synoptic Phenomena Alignment and Coverage Evaluation (SPACE)
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The match score sm ... coverage ratio rc ... final SPACE score s = sm · rc

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

59, 2781

Riley Brady and Aaron Spring,climpred: Verification of weather and climate forecasts, Journal of Open Source Software6(2021), no. 59, 2781

work page 2021
[2]

5322–5333 (en)

Jian Chen, Peilin Zhou, Yining Hua, Dading Chong, Meng Cao, Yaowei Li, Wei Chen, Bing Zhu, Junwei Liang, and Zixuan Yuan,ClimateIQA: A New Dataset and Benchmark to Advance Vision- Language Models in Meteorology Anomalies Analysis, Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (Toronto ON Canada), ACM, August 2025...

work page 2025
[3]

Xuming He, Zhiyuan You, Junchao Gong, Couhua Liu, Xiaoyu Yue, Peiqin Zhuang, Wenlong Zhang, and Lei Bai,RadarQA: Multi-modal Quality Analysis of Weather Radar Forecasts, 2025, Version Number: 1

work page 2025
[4]

730, 1999–2049 (en)

Hans Hersbach, Bill Bell, Paul Berrisford, Shoji Hirahara, András Horányi, Joaquín Muñoz- Sabater, Julien Nicolas, Carole Peubey, Raluca Radu, Dinand Schepers, Adrian Simmons, Cornel Soci, Saleh Abdalla, Xavier Abellan, Gianpaolo Balsamo, Peter Bechtold, Gionata Biavati, Jean Bidlot, Massimo Bonavita, Giovanna De Chiara, Per Dahlgren, Dick Dee, Michail Di...

work page 2020
[5]

Himanshi Jain and Raksha Jain,Big data in weather forecasting: Applications and challenges, 2017 International Conference on Big Data Analytics and Computational Intelligence (ICBDAC) (Chirala, Andhra Pradesh, India), IEEE, March 2017, pp. 138–142

work page 2017
[6]

Jones, J.W

J.W. Jones, J.W. Hansen, F.S. Royce, and C.D. Messina,Potential benefits of climate forecasting to agriculture, Agriculture, Ecosystems & Environment82(2000), no. 1-3, 169–184 (en)

work page 2000
[7]

Haobo Li, Zhaowei Wang, Jiachen Wang, Yueya Wang, Alexis Kai Hon Lau, and Huamin Qu, CLLMate: A Multimodal Benchmark for Weather and Climate Events Forecasting, 2024, Version Number: 2

work page 2024
[8]

Data/Math11

ChengqianMa,ZhanxiangHua,AlexandraAnderson-Frey,VikramIyer,XinLiu,andLianhuiQin, WeatherQA: Can Multimodal Language Models Reason about Severe Weather?, 2024, Version Number: 2. Data/Math11

work page 2024
[9]

9, 2257–2277

AmandaM.Murphy,CameronR.Homeyer,andKileyQ.Allen,DevelopmentandInvestigationof GridRad-Severe, a Multiyear Severe Event Radar Dataset, Monthly Weather Review151(2023), no. 9, 2257–2277

work page 2023
[10]

NationalCentersforEnvironmentalPrediction/NationalWeatherService/NOAA/U.S.Department of Commerce,NCEP GFS 0.25 Degree Global Forecast Grids Historical Archive, 2015, Artwork Size: 574.507 Tbytes Pages: 574.507 Tbytes

work page 2015
[11]

C. A. Randles, A. M. Da Silva, V. Buchard, P. R. Colarco, A. Darmenov, R. Govindaraju, A. Smirnov, B. Holben, R. Ferrare, J. Hair, Y. Shinozuka, and C. J. Flynn,The MERRA-2 Aerosol Reanalysis, 1980 Onward. Part I: System Description and Data Assimilation Evaluation, Journal of Climate30(2017), no. 17, 6823–6850 (en)

work page 1980
[12]

Shuo Tang, Jian Xu, Jiadong Zhang, Yi Chen, Qizhao Jin, Lingdong Shen, Chenglin Liu, and Shiming Xiang,MeteorPred: A Meteorological Multimodal Large Model and Dataset for Severe Weather Event Prediction, 2025, Version Number: 2

work page 2025
[13]

Uccellini and John E

Louis W. Uccellini and John E. Ten Hoeve,Evolving the National Weather Service to Build a Weather-Ready Nation: Connecting Observations, Forecasts, and Warnings to Decision-Makers through Impact-Based Decision Support Services, Bulletin of the American Meteorological Society100(2019), no. 10, 1923–1942

work page 2019
[14]

Kingsley Eghonghon Ukhurebor, Charles Oluwaseun Adetunji, Olaniyan T. Olugbemi, W.Nwankwo,AkinolaSamsonOlayinka,C.Umezuruike,andDanielIngoHefft,Precisionagricul- ture:Weatherforecastingforfuturefarming,AI,EdgeandIoT-basedSmartAgriculture,Elsevier, 2022, pp. 101–121 (en)

work page 2022
[15]

Sumanth Varambally, Marshall Fisher, Jas Thakker, Yiwei Chen, Zhirui Xia, Yasaman Jafari, Ruijia Niu, Manas Jain, Veeramakali Vignesh Manivannan, Zachary Novack, Luyu Han, Srikar Eranky, Salva Rühling Cachay, Taylor Berg-Kirkpatrick, Duncan Watson-Parris, Yi-An Ma, and Rose Yu,Zephyrus: An Agentic Framework for Weather Science, 2025, Version Number: 1

work page 2025
[16]

Day 3”, “extended period

Mark Veillette, Siddharth Samsi, and Chris Mattioli,SEVIR : A Storm Event Imagery Dataset for DeepLearningApplicationsinRadarandSatelliteMeteorology,AdvancesinNeuralInformation Processing Systems (H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, eds.), vol. 33, Curran Associates, Inc., 2020, pp. 22009–22019. Appendix A. Experimental detail...

work page arXiv 2020

[1] [1]

59, 2781

Riley Brady and Aaron Spring,climpred: Verification of weather and climate forecasts, Journal of Open Source Software6(2021), no. 59, 2781

work page 2021

[2] [2]

5322–5333 (en)

Jian Chen, Peilin Zhou, Yining Hua, Dading Chong, Meng Cao, Yaowei Li, Wei Chen, Bing Zhu, Junwei Liang, and Zixuan Yuan,ClimateIQA: A New Dataset and Benchmark to Advance Vision- Language Models in Meteorology Anomalies Analysis, Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (Toronto ON Canada), ACM, August 2025...

work page 2025

[3] [3]

Xuming He, Zhiyuan You, Junchao Gong, Couhua Liu, Xiaoyu Yue, Peiqin Zhuang, Wenlong Zhang, and Lei Bai,RadarQA: Multi-modal Quality Analysis of Weather Radar Forecasts, 2025, Version Number: 1

work page 2025

[4] [4]

730, 1999–2049 (en)

Hans Hersbach, Bill Bell, Paul Berrisford, Shoji Hirahara, András Horányi, Joaquín Muñoz- Sabater, Julien Nicolas, Carole Peubey, Raluca Radu, Dinand Schepers, Adrian Simmons, Cornel Soci, Saleh Abdalla, Xavier Abellan, Gianpaolo Balsamo, Peter Bechtold, Gionata Biavati, Jean Bidlot, Massimo Bonavita, Giovanna De Chiara, Per Dahlgren, Dick Dee, Michail Di...

work page 2020

[5] [5]

Himanshi Jain and Raksha Jain,Big data in weather forecasting: Applications and challenges, 2017 International Conference on Big Data Analytics and Computational Intelligence (ICBDAC) (Chirala, Andhra Pradesh, India), IEEE, March 2017, pp. 138–142

work page 2017

[6] [6]

Jones, J.W

J.W. Jones, J.W. Hansen, F.S. Royce, and C.D. Messina,Potential benefits of climate forecasting to agriculture, Agriculture, Ecosystems & Environment82(2000), no. 1-3, 169–184 (en)

work page 2000

[7] [7]

Haobo Li, Zhaowei Wang, Jiachen Wang, Yueya Wang, Alexis Kai Hon Lau, and Huamin Qu, CLLMate: A Multimodal Benchmark for Weather and Climate Events Forecasting, 2024, Version Number: 2

work page 2024

[8] [8]

Data/Math11

ChengqianMa,ZhanxiangHua,AlexandraAnderson-Frey,VikramIyer,XinLiu,andLianhuiQin, WeatherQA: Can Multimodal Language Models Reason about Severe Weather?, 2024, Version Number: 2. Data/Math11

work page 2024

[9] [9]

9, 2257–2277

AmandaM.Murphy,CameronR.Homeyer,andKileyQ.Allen,DevelopmentandInvestigationof GridRad-Severe, a Multiyear Severe Event Radar Dataset, Monthly Weather Review151(2023), no. 9, 2257–2277

work page 2023

[10] [10]

NationalCentersforEnvironmentalPrediction/NationalWeatherService/NOAA/U.S.Department of Commerce,NCEP GFS 0.25 Degree Global Forecast Grids Historical Archive, 2015, Artwork Size: 574.507 Tbytes Pages: 574.507 Tbytes

work page 2015

[11] [11]

C. A. Randles, A. M. Da Silva, V. Buchard, P. R. Colarco, A. Darmenov, R. Govindaraju, A. Smirnov, B. Holben, R. Ferrare, J. Hair, Y. Shinozuka, and C. J. Flynn,The MERRA-2 Aerosol Reanalysis, 1980 Onward. Part I: System Description and Data Assimilation Evaluation, Journal of Climate30(2017), no. 17, 6823–6850 (en)

work page 1980

[12] [12]

Shuo Tang, Jian Xu, Jiadong Zhang, Yi Chen, Qizhao Jin, Lingdong Shen, Chenglin Liu, and Shiming Xiang,MeteorPred: A Meteorological Multimodal Large Model and Dataset for Severe Weather Event Prediction, 2025, Version Number: 2

work page 2025

[13] [13]

Uccellini and John E

Louis W. Uccellini and John E. Ten Hoeve,Evolving the National Weather Service to Build a Weather-Ready Nation: Connecting Observations, Forecasts, and Warnings to Decision-Makers through Impact-Based Decision Support Services, Bulletin of the American Meteorological Society100(2019), no. 10, 1923–1942

work page 2019

[14] [14]

Kingsley Eghonghon Ukhurebor, Charles Oluwaseun Adetunji, Olaniyan T. Olugbemi, W.Nwankwo,AkinolaSamsonOlayinka,C.Umezuruike,andDanielIngoHefft,Precisionagricul- ture:Weatherforecastingforfuturefarming,AI,EdgeandIoT-basedSmartAgriculture,Elsevier, 2022, pp. 101–121 (en)

work page 2022

[15] [15]

Sumanth Varambally, Marshall Fisher, Jas Thakker, Yiwei Chen, Zhirui Xia, Yasaman Jafari, Ruijia Niu, Manas Jain, Veeramakali Vignesh Manivannan, Zachary Novack, Luyu Han, Srikar Eranky, Salva Rühling Cachay, Taylor Berg-Kirkpatrick, Duncan Watson-Parris, Yi-An Ma, and Rose Yu,Zephyrus: An Agentic Framework for Weather Science, 2025, Version Number: 1

work page 2025

[16] [16]

Day 3”, “extended period

Mark Veillette, Siddharth Samsi, and Chris Mattioli,SEVIR : A Storm Event Imagery Dataset for DeepLearningApplicationsinRadarandSatelliteMeteorology,AdvancesinNeuralInformation Processing Systems (H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, eds.), vol. 33, Curran Associates, Inc., 2020, pp. 22009–22019. Appendix A. Experimental detail...

work page arXiv 2020