pith. sign in

arxiv: 2601.03633 · v2 · submitted 2026-01-07 · 💻 cs.CV · cs.AI

MFC-RFNet: A Multi-scale Guided Rectified Flow Network for Radar Sequence Prediction

Pith reviewed 2026-05-16 16:57 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords precipitation nowcastingradar echo predictionrectified flowmulti-scale fusiongenerative modelspatiotemporal predictionradar sequencenowcasting
0
0 comments X

The pith

MFC-RFNet combines rectified flow training with multi-scale guided feature communication to enhance radar sequence prediction for precipitation nowcasting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to improve accurate high-resolution precipitation nowcasting from radar echo sequences, a task important for disaster mitigation and economic planning. It tackles challenges like modeling complex multi-scale evolution, correcting inter-frame feature misalignment, and capturing long-range spatiotemporal context by proposing MFC-RFNet, a generative framework. The approach integrates multi-scale communication via the Feature Communication Module and Wavelet-Guided Skip Connection, spatial alignment through Condition-Guided Spatial Transform Fusion, and adopts rectified flow training for near-linear trajectories that support few-step sampling, while adding lightweight Vision-RWKV blocks for efficient long-range dependencies. Evaluations on four public datasets show consistent gains over baselines, with clearer echo morphology at higher rain rates and sustained skill at longer lead times.

Core claim

The central claim is that MFC-RFNet integrates multi-scale communication with guided feature fusion and adopts rectified flow training to learn near-linear probability-flow trajectories, enabling few-step sampling with stable fidelity, while specific modules like Wavelet-Guided Skip Connection preserve high-frequency details, Feature Communication Module promotes cross-scale interaction, Condition-Guided Spatial Transform Fusion corrects displacement, and Vision-RWKV blocks capture long-range context at low resolution, yielding clearer predictions on SEVIR, MeteoNet, Shanghai, and CIKM datasets.

What carries the argument

The synergy of rectified flow training with scale-aware communication via FCM, frequency-aware fusion via WGSC, spatial alignment via CGSTF, and Vision-RWKV blocks for spatiotemporal dependencies.

If this is right

  • Clearer echo morphology at higher rain-rate thresholds
  • Sustained prediction skill at longer lead times
  • Few-step sampling with stable fidelity
  • Better handling of complex multi-scale evolution and inter-frame displacements
  • Consistent gains across diverse public radar datasets

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar architectures could extend to other moving-pattern sequence tasks such as fluid flow or satellite cloud tracking
  • The few-step sampling property may support real-time operational nowcasting systems with limited compute
  • Placing RWKV blocks only at low resolutions suggests a path to scale the model to higher resolutions without linear compute growth

Load-bearing premise

The proposed synergy of RF training with scale-aware communication, spatial alignment, and frequency-aware fusion will produce robust gains without the specific module designs being overfit to the four chosen datasets.

What would settle it

Showing that on the SEVIR dataset the model produces no clearer echo morphology at higher rain-rate thresholds than strong baselines, or loses skill faster at longer lead times, would falsify the claim of consistent improvements.

Figures

Figures reproduced from arXiv: 2601.03633 by Chaorong Li, Chuanhu Deng, Qiang Yang, Rongyao Deng, Wenjie Luo.

Figure 1
Figure 1. Figure 1: Pareto plot of parameters, compute, and per￾formance on SEVIR. Vertical axis: CSI-M (%); horizontal axis: number of parameters (Million). The color bar encodes computational cost in GFLOPs—darker indicates higher cost. Deep learning-based radar sequence models have ad￾vanced nowcasting substantially. ConvLSTM [3] first demon￾strated clear gains over classical extrapolation on real radar data, and TrajGRU [… view at source ↗
Figure 2
Figure 2. Figure 2: MFC-RFNet overall architecture. Top: the conditional encoder extracts multi-scale features; FCM performs cross-scale communication; CGSTF aligns shallow features; WGSC modulates skip fusion. Bottom: the RF generator updates 𝑧𝑡 with few ODE steps (𝑧𝑡+Δ𝑡 = 𝑧𝑡 + 𝑣𝜃 Δ𝑡) to produce 𝐾 future frames, with VRWKV placed in deep layers to supply long-range context. In parallel, generative paradigms—from GANs to diff… view at source ↗
Figure 3
Figure 3. Figure 3: Feature Communication Module (FCM). Left: Multi-directional fusion—top–down (TD), bottom–up (BU), and lateral branches are built at every level; an SE head conditioned on 𝐅 lat 𝑖 produces weights 𝜶𝑖 that mix the three streams into 𝐅̂ 𝑖 . Middle: Pixel-wise cross-scale communication—all 𝐅̂ 𝑖 are first projected with 1×1 convolutions and bilinearly aligned to a reference resolution (𝐻⋆×𝑊 ⋆), concatenated, an… view at source ↗
Figure 4
Figure 4. Figure 4: Condition-Guided Spatial Transform Fusion (CGSTF). Conditioning features produce an offset field 𝐎 (bounded by tanh) to perturb the base grid (Grid𝑥 , Grid𝑦 ) and obtain Gridoffset. GridSample warps the main features to 𝐅warp, which is fused with the conditional features to yield 𝐅out. 3.3. Condition-Guided Spatial Transform Fusion (CGSTF) Background & design rationale. The evolution of pre￾cipitation syst… view at source ↗
Figure 5
Figure 5. Figure 5: The Wavelet-Guided Skip Connection (WGSC) architecture. Conditional features are processed by the Wavelet Processor (top inset) via 2D DWT to yield a frequency￾aware guidance map (𝐀wav). Concurrently (main path, bottom), spatial (SA) and channel (CA) attention masks are derived from concatenated encoder and decoder features. These components, along with 𝐀wav, inform the synthesis of three distinct gates. E… view at source ↗
Figure 6
Figure 6. Figure 6: VRWKV placement strategies at deep stages. Comparison of integration strategies for VRWKV blocks at the encoder tail, bottleneck, and first decoder layer. (a) Baseline architecture at these locations, without VRWKV. (b) Serial insertion with 𝑁 blocks per stage; specifically, the 𝑁 = 1 case corresponds to the KAN-VRWKV Block configuration used in MFC-RFNet. (c) Parallel integration via residual mixing (Add&… view at source ↗
Figure 7
Figure 7. Figure 7: CSI and HSS metrics at different prediction time steps for various methods on the SEVIR dataset. [52]); efficient video predictors (SimVP [53], FourCast￾Net [18]); Transformer families (EarthFormer [17], Earth￾Farseer [54]); a physics-informed design (PhyDNet [55]); a nowcasting-specific model (AlphaPre [56]); and recent generative approaches (NowcastNet [14], DiffCast [57]). Metrics include CSI at multipl… view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative comparisons on the SEVIR dataset. better preserved and spurious weak echoes are reduced. On Shanghai and MeteoNet, narrow bands remain more continuous; on CIKM, small cells show fewer fragmented artifacts. These observations are consistent with the intended roles of FCM, CGSTF, and WGSC in coordinating scales, aligning shallow features, and modulating skip fusion. From a cost perspective, the m… view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative comparisons on the Shanghai dataset. performance gain ( [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Qualitative comparisons on the MeteoNet dataset. removing any component degrades performance, confirm￾ing their complementary roles. Disabling multi-directional fusion causes the most significant drop in high-threshold CSI (a decrease of about 8% for CSI-32), highlighting the critical importance of bidirectional information flow for coordinating global context and local details. Removing the cross-scale a… view at source ↗
Figure 11
Figure 11. Figure 11: Qualitative comparisons on the CIKM dataset [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
read the original abstract

Accurate and high-resolution precipitation nowcasting from radar echo sequences is crucial for disaster mitigation and economic planning, yet it remains a significant challenge. Key difficulties include modeling complex multi-scale evolution, correcting inter-frame feature misalignment caused by displacement, and efficiently capturing long-range spatiotemporal context without sacrificing spatial fidelity. To address these issues, we present the Multi-scale Feature Communication Rectified Flow (RF) Network (MFC-RFNet), a generative framework that integrates multi-scale communication with guided feature fusion. To enhance multi-scale fusion while retaining fine detail, a Wavelet-Guided Skip Connection (WGSC) preserves high-frequency components, and a Feature Communication Module (FCM) promotes bidirectional cross-scale interaction. To correct inter-frame displacement, a Condition-Guided Spatial Transform Fusion (CGSTF) learns spatial transforms from conditioning echoes to align shallow features. The backbone adopts rectified flow training to learn near-linear probability-flow trajectories, enabling few-step sampling with stable fidelity. Additionally, lightweight Vision-RWKV (RWKV) blocks are placed at the encoder tail, the bottleneck, and the first decoder layer to capture long-range spatiotemporal dependencies at low spatial resolutions with moderate compute. Evaluations on four public datasets (SEVIR, MeteoNet, Shanghai, and CIKM) demonstrate consistent improvements over strong baselines, yielding clearer echo morphology at higher rain-rate thresholds and sustained skill at longer lead times. These results suggest that the proposed synergy of RF training with scale-aware communication, spatial alignment, and frequency-aware fusion presents an effective and robust approach for radar-based nowcasting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes MFC-RFNet, a generative model for radar echo sequence prediction in precipitation nowcasting. It integrates a rectified-flow backbone with a Wavelet-Guided Skip Connection (WGSC) for high-frequency preservation, a Feature Communication Module (FCM) for bidirectional multi-scale interaction, a Condition-Guided Spatial Transform Fusion (CGSTF) for inter-frame alignment, and lightweight Vision-RWKV blocks for long-range spatiotemporal modeling. The central claim is that this specific combination produces consistent quantitative and qualitative improvements over strong baselines on four public datasets (SEVIR, MeteoNet, Shanghai, CIKM), with clearer morphology at high rain-rate thresholds and better skill at longer lead times.

Significance. If the performance gains are shown to arise specifically from the claimed module synergies rather than from the RF backbone or RWKV alone, the work would offer a practical advance in nowcasting by combining probability-flow training with scale-aware fusion and alignment mechanisms. The design choices for efficient long-range modeling at low resolution are potentially reusable in other spatiotemporal forecasting tasks.

major comments (2)
  1. [Experimental section] Experimental section: the manuscript reports aggregate metric improvements versus baselines but provides no ablation tables that remove WGSC, FCM, CGSTF, or RWKV one at a time (or in combination) while keeping the RF training and encoder-decoder fixed. Without these controlled comparisons across all four datasets, lead times, and rain-rate thresholds, the attribution of gains to the proposed synergy remains unsupported.
  2. [Results and discussion] Results and discussion: no error bars, statistical significance tests, or per-component contribution tables are described for the claimed improvements at higher rain-rate thresholds and longer lead times. This makes it impossible to assess whether the reported gains are robust or dataset-specific.
minor comments (2)
  1. [Abstract] Abstract: the list of datasets and claimed benefits would be clearer if accompanied by the primary evaluation metrics (e.g., CSI, MSE, or SSIM) used to quantify 'consistent improvements'.
  2. Notation: ensure all module acronyms (WGSC, FCM, CGSTF, RWKV) are expanded on first use in the main text and that the precise placement of RWKV blocks is illustrated in a diagram.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our paper MFC-RFNet. The major comments point to the need for more rigorous experimental validation through ablations and statistical analyses. We address these points below and will make the necessary revisions to the manuscript.

read point-by-point responses
  1. Referee: [Experimental section] Experimental section: the manuscript reports aggregate metric improvements versus baselines but provides no ablation tables that remove WGSC, FCM, CGSTF, or RWKV one at a time (or in combination) while keeping the RF training and encoder-decoder fixed. Without these controlled comparisons across all four datasets, lead times, and rain-rate thresholds, the attribution of gains to the proposed synergy remains unsupported.

    Authors: We acknowledge the lack of ablation studies in the current version. To address this, we will add comprehensive ablation experiments in the revised manuscript. These will include removing each module (WGSC, FCM, CGSTF, RWKV) individually and in combinations, while keeping the rectified flow training and overall architecture fixed. Results will be reported across all four datasets, multiple lead times, and rain-rate thresholds to clearly attribute the performance gains to the proposed synergies. revision: yes

  2. Referee: [Results and discussion] Results and discussion: no error bars, statistical significance tests, or per-component contribution tables are described for the claimed improvements at higher rain-rate thresholds and longer lead times. This makes it impossible to assess whether the reported gains are robust or dataset-specific.

    Authors: We agree that including error bars, statistical tests, and per-component analyses would enhance the credibility of our results. In the revision, we plan to compute and report error bars (e.g., standard deviations over multiple runs or cross-validation folds), conduct statistical significance tests for the improvements at high rain-rate thresholds and longer lead times, and provide per-component contribution tables. This will help demonstrate the robustness and dataset-specific nature of the gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation or claims

full rationale

The paper introduces MFC-RFNet as a new architecture integrating rectified-flow training with WGSC, FCM, CGSTF, and RWKV modules, then reports empirical gains on four public datasets. No equations, fitted parameters, or self-citations appear in the provided text that would reduce any claimed prediction or result to an input by construction. The central claims rest on external dataset evaluations rather than self-referential definitions or imported uniqueness theorems, making the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; all technical details are deferred to the unavailable full manuscript.

pith-pipeline@v0.9.0 · 5595 in / 1180 out tokens · 38435 ms · 2026-05-16T16:57:55.978691+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 6 internal anchors

  1. [1]

    Guidelines for Nowcasting Techniques

    World Meteorological Organization. Guidelines for Nowcasting Techniques. WMO-No. 1198. WMO, Geneva, Switzerland, 2017

  2. [2]

    Wilson, Isztar Zawadzki, Sue P

    Juanzhen Sun, Ming Xue, James W. Wilson, Isztar Zawadzki, Sue P. Ballard, Jeanette Onvlee-Hooimeyer, Paul Joe, Dale M. Barker, Ping- Wah Li, Brian Golding, Mei Xu, and James Pinto. Use of nwp for nowcasting convective precipitation: Recent progress and challenges. BulletinoftheAmericanMeteorologicalSociety ,95(3):409–426,2014

  3. [3]

    Convolutional lstm network: A machine learning approachfor precipitationnowcasting

    Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-Chun Woo. Convolutional lstm network: A machine learning approachfor precipitationnowcasting. InAdvancesin Neural InformationProcessingSystems(NeurIPS) ,volume28,pages802–810, 2015

  4. [4]

    Deep learning for precipitation nowcasting: A benchmark and a new model

    XingjianShi,ZhihanGao,LeonardLausen,HaoWang,Dit-YanYeung, Wai-Kin Wong, and Wang-Chun Woo. Deep learning for precipitation nowcasting: A benchmark and a new model. InAdvances in Neural Information Processing Systems (NeurIPS), volume 30, 2017

  5. [5]

    arXiv preprint arXiv:2003.12140 , year=

    Casper Kaae Sønderby, Lasse Espeholt, Jonathan Heek, Mostafa Dehghani, Avital Oliver, Tim Salimans, Shreya Agrawal, Jason Hickey, and Nal Kalchbrenner. Metnet: A neural weather model for precipitation forecasting.arXiv preprint arXiv:2003.12140, 2020

  6. [6]

    Deep learning for twelve hour precipitation forecasts.Nature Communica- tions, 13(1):1–10, 2022

    Lasse Espeholt, Shreya Agrawal, Casper Sønderby, Manoj Kumar, Jonathan Heek, Carla Bromberg, Cenk Gazen, Rob Carver, Marcin Andrychowicz,JasonHickey,AaronBell,andNalKalchbrenner. Deep learning for twelve hour precipitation forecasts.Nature Communica- tions, 13(1):1–10, 2022. Wenjie Luo et al.:Preprint submitted to Elsevier Page 14 of 16 Short Title of the Article

  7. [7]

    Skilfulprecipitationnowcastingusingdeepgenerativemodelsofradar

    Suman Ravuri, Karel Lenc, Matthew Willson, Dmitry Kangin, Remi Lam, Piotr Mirowski, Megan Fitzsimons, Maria Athanassiadou, Sheleem Kashem, Sam Madge, Rachel Prudden, Amol Mandhane, Aidan Clark, Andrew Brock, Karen Simonyan, Raia Hadsell, Niall Robinson, Ellen Clancy, Alberto Arribas, and Shakir Mohamed. Skilfulprecipitationnowcastingusingdeepgenerativemod...

  8. [8]

    Prediff: Precipitation nowcasting with latent diffusion models

    Zhihan Gao, Xingjian Shi, Boran Han, Hao Wang, Xiaoyong Jin, Danielle Maddix, Yi Zhu, Mu Li, and Yuyang (Bernie) Wang. Prediff: Precipitation nowcasting with latent diffusion models. InAdvances in Neural Information Processing Systems (NeurIPS), volume 36, 2023

  9. [9]

    Residual denoising diffusion models

    Jiawei Liu, Qiang Wang, Huijie Fan, Yinong Wang, Yandong Tang, and Liangqiong Qu. Residual denoising diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2773–2783, 2024

  10. [10]

    Flow Matching for Generative Modeling

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022

  11. [11]

    Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

    Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022

  12. [12]

    Improving the training of rectified flows

    Sangyun Lee, Zinan Lin, and Giulia Fanti. Improving the training of rectified flows.arXiv preprint arXiv:2405.20320, 2024. NeurIPS 2024

  13. [13]

    An long short-term memory model with multi-scale context fusion and attention for radar echo extrapolation.Remote Sensing, 16(2):376, 2024

    Guangxin He, Haifeng Qu, Jingjia Luo, Yong Cheng, Jun Wang, and Ping Zhang. An long short-term memory model with multi-scale context fusion and attention for radar echo extrapolation.Remote Sensing, 16(2):376, 2024

  14. [14]

    Skilfulnowcasting ofextremeprecipitationwithnowcastnet

    Yuchen Zhang, Mingsheng Long, Kaiyuan Chen, Lanxiang Xing, RonghuaJin,MichaelIJordan,andJianminWang. Skilfulnowcasting ofextremeprecipitationwithnowcastnet. Nature,619(7970):526–532, 2023

  15. [15]

    Mo- tionrnn: A flexible model for video prediction with spacetime-varying motions

    Haixu Wu, Zhiyu Yao, Jianmin Wang, and Mingsheng Long. Mo- tionrnn: A flexible model for video prediction with spacetime-varying motions. InCVPR, 2021

  16. [16]

    Phydnet: Learning physics- guided dynamics for video prediction

    Vincent LeGuen and Nicolas Thome. Phydnet: Learning physics- guided dynamics for video prediction. InCVPR, 2020

  17. [17]

    Earthformer: Exploring space- time transformers for earth system forecasting.Advances in Neural Information Processing Systems, 35:25390–25403, 2022

    Zhihan Gao, Xingjian Shi, Hao Wang, Yi Zhu, Yuyang Bernie Wang, Mu Li, and Dit-Yan Yeung. Earthformer: Exploring space- time transformers for earth system forecasting.Advances in Neural Information Processing Systems, 35:25390–25403, 2022

  18. [18]

    FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators

    Jaideep Pathak, Shashank Subramanian, Peter Harrington, Sanjeev Raja,AsheshChattopadhyay,MortezaMardani,ThorstenKurth,David Hall,Zongyi Li, KamyarAzizzadenesheli, etal. Fourcastnet:A global data-driven high-resolution weather model using adaptive fourier neural operators.arXiv preprint arXiv:2202.11214, 2022

  19. [19]

    Spatial transformer networks

    Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. Spatial transformer networks. InAdvances in Neural Information Processing Systems (NeurIPS), volume 28, 2015

  20. [20]

    Wavesnet: Wavelet integrated deep networks for image segmentation

    Wei Li and Linlin Shen. Wavesnet: Wavelet integrated deep networks for image segmentation. arXiv preprint arXiv:2005.14461, 2020. Wavelet-based U-Net variant for multi-resolution analysis

  21. [21]

    RWKV: Reinventing RNNs for the Transformer Era

    Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Stella Biderman, Huanqi Cao, Xin Cheng, Michael Chung, MatteoGrella,andothers. Rwkv:Reinventingrnnsforthetransformer era. arXiv preprint arXiv:2305.13048, 2023

  22. [22]

    Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures.arXiv preprint arXiv:2403.02308,

    Bo Peng, Stella Biderman, Matteo Grella, and others. Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures. arXiv preprint arXiv:2403.02308, 2024

  23. [23]

    Memory in memory: A predictive neural networkforlearninghigher-ordernon-stationarityfromspatiotemporal dynamics

    Yunbo Wang, Jianjin Zhang, Hongyu Zhu, Mingsheng Long, Jianmin Wang, and Philip S Yu. Memory in memory: A predictive neural networkforlearninghigher-ordernon-stationarityfromspatiotemporal dynamics. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9154–9162, 2019

  24. [24]

    Pfst-lstm: A spatiotemporal lstm model with pseudoflow prediction for precipitation nowcasting

    Chuyao Luo, Xutao Li, and Yunming Ye. Pfst-lstm: A spatiotemporal lstm model with pseudoflow prediction for precipitation nowcasting. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14:843–857, 2020

  25. [25]

    Ssa-unet: Advanced precipitation nowcasting via channel shuffling

    Marco Turzi and Siamak Mehrkanoon. Ssa-unet: Advanced precipitation nowcasting via channel shuffling. arXiv preprint arXiv:2504.18309, 2025

  26. [26]

    Rainhcnet: Hybrid high-low frequency and cross-scale network for precipitation nowcasting

    Lei Wang, Zheng Wang, Wenjun Hu, and Cong Bai. Rainhcnet: Hybrid high-low frequency and cross-scale network for precipitation nowcasting. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025

  27. [27]

    Aa-transunet: Attention augmented transunet for nowcasting tasks

    Yimin Yang and Siamak Mehrkanoon. Aa-transunet: Attention augmented transunet for nowcasting tasks. In2022 international joint conference on neural networks (IJCNN), pages 01–08. IEEE, 2022

  28. [28]

    Rainformer: Features extraction balanced network for radar-based precipitation nowcasting

    Cong Bai, Feng Sun, Jinglin Zhang, Yi Song, and Shengyong Chen. Rainformer: Features extraction balanced network for radar-based precipitation nowcasting. IEEE Geoscience and Remote Sensing Letters, 19:1–5, 2022

  29. [29]

    Tu2net-gan:Atemporalprecipitationnowcastingmodel with multiple decoding modules.Pattern Recognition Letters, 178:98– 105, 2024

    Xudong Ling, ChaoRong Li, Peng Yang, Yuanyuan Huang, and FengqingQin. Tu2net-gan:Atemporalprecipitationnowcastingmodel with multiple decoding modules.Pattern Recognition Letters, 178:98– 105, 2024

  30. [30]

    Precipitation nowcasting using diffusion transformer with causal attention.IEEE Transactions on Geoscience and Remote Sensing, 2024

    ChaoRong Li, XuDong Ling, YiLan Xue, Wenjie Luo, LiHong Zhu, FengQing Qin, Yaodong Zhou, and Yuanyuan Huang. Precipitation nowcasting using diffusion transformer with causal attention.IEEE Transactions on Geoscience and Remote Sensing, 2024

  31. [31]

    Extreme precipitation nowcasting using multi-task latent diffusion models.IEEE Transactions on Geoscience and Remote Sensing, 2025

    Chaorong Li, Xudong Ling, Qiang Yang, Mingxiang Chen, Fengqing Qin, and Yuanyuan Huang. Extreme precipitation nowcasting using multi-task latent diffusion models.IEEE Transactions on Geoscience and Remote Sensing, 2025

  32. [32]

    Instaflow: One step is enough for high-quality diffusion-based text-to-image generation

    Xingchao Liu, Xiwen Zhang, Jianzhu Ma, Jian Peng, et al. Instaflow: One step is enough for high-quality diffusion-based text-to-image generation. In The Twelfth International Conference on Learning Representations, 2023

  33. [33]

    Accurate medium-range global weather forecasting with 3d neural networks.Nature, 619:533–538, 2023

    Kaifeng Bi et al. Accurate medium-range global weather forecasting with 3d neural networks.Nature, 619:533–538, 2023

  34. [34]

    Cross-layer feature pyramid transformer for small object detection in aerial images.IEEE Transactions on Geoscience and Remote Sensing, 2025

    Zewen Du, Zhenjiang Hu, Guiyu Zhao, Ying Jin, and Hongbin Ma. Cross-layer feature pyramid transformer for small object detection in aerial images.IEEE Transactions on Geoscience and Remote Sensing, 2025

  35. [35]

    Towards efficient use of multi-scale features in transformer-based object detectors

    Gongjie Zhang, Zhipeng Luo, Zichen Tian, Jingyi Zhang, Xiaoqin Zhang, and Shijian Lu. Towards efficient use of multi-scale features in transformer-based object detectors. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6206– 6216, 2023

  36. [36]

    Multi-scale spatial-temporal transformer for meteorologi- cal variable forecasting.IEEE Transactions on Circuits and Systems for Video Technology, 2024

    Tian-BaoLi,Yu-TingSu,DanSong,Wen-HuiLi,Zhi-QiangWei,and An-An Liu. Multi-scale spatial-temporal transformer for meteorologi- cal variable forecasting.IEEE Transactions on Circuits and Systems for Video Technology, 2024

  37. [37]

    Internimage: Exploring large-scale vision foundation models with deformableconvolutions

    Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, et al. Internimage: Exploring large-scale vision foundation models with deformableconvolutions. InProceedingsoftheIEEE/CVFconference oncomputervisionandpatternrecognition ,pages14408–14419,2023

  38. [38]

    Spectformer: Frequency and attention is what you need in a vision transformer

    Badri N Patro, Vinay P Namboodiri, and Vijay S Agneeswaran. Spectformer: Frequency and attention is what you need in a vision transformer. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 9543–9554. IEEE, 2025

  39. [39]

    Gfnet:Globalfilternetworksforvisualrecognition

    Yongming Rao, Wenliang Zhao, Zheng Zhu, Jie Zhou, and Jiwen Lu. Gfnet:Globalfilternetworksforvisualrecognition. IEEETransactions on Pattern Analysis and Machine Intelligence, 45(9):10960–10973, 2023

  40. [40]

    Ewt: Efficient wavelet-transformer for single image denoising

    Juncheng Li, Bodong Cheng, Ying Chen, Guangwei Gao, Jun Shi, and Tieyong Zeng. Ewt: Efficient wavelet-transformer for single image denoising. Neural Networks, 177:106378, 2024

  41. [41]

    Wavelet diffusion models are fast and scalable image generators

    Hao Phung, Quan Dao, and Anh Tran. Wavelet diffusion models are fast and scalable image generators. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10199– 10208, 2023

  42. [42]

    Statistical downscaling of precipitation in northwestern iran using a hybrid model of discrete wavelet transform, artificial neural networks, and quantile mapping

    Majid Taie Semiromi and Manfred Koch. Statistical downscaling of precipitation in northwestern iran using a hybrid model of discrete wavelet transform, artificial neural networks, and quantile mapping. Wenjie Luo et al.:Preprint submitted to Elsevier Page 15 of 16 Short Title of the Article Theoretical and Applied Climatology, 155(7):6591–6621, 2024

  43. [43]

    FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

    Tri Dao. Flashattention-2: Faster attention with better parallelism and work partitioning.arXiv preprint arXiv:2307.08691, 2023

  44. [44]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023

  45. [45]

    U-kan makes strong backbone for medical image segmentation and generation

    Chenxin Li, Xinyu Liu, Wuyang Li, Cheng Wang, Hengyu Liu, Yifan Liu, Zhen Chen, and Yixuan Yuan. U-kan makes strong backbone for medical image segmentation and generation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 4652– 4660, 2025

  46. [46]

    Sevir: A storm event imagery dataset for deep learning applications in radar and satellite meteorology

    Mark Veillette, Siddharth Samsi, and Chris Mattioli. Sevir: A storm event imagery dataset for deep learning applications in radar and satellite meteorology. Advances in Neural Information Processing Systems, 33:22009–22019, 2020

  47. [47]

    Meteonet, an open reference weather dataset by meteo-france

    Gwennaëlle Larvor, Léa Berthomier, Vincent Chabot, Brice Le Pape, Bruno Pradel, and Lior Perez. Meteonet, an open reference weather dataset by meteo-france. 2020.URL https://www. kaggle. com/datasets/katerpillar/meteonet, 2020

  48. [48]

    Earthand Space Science, 7(2):e2019EA000812, 2020

    LeiChen,YuanCao,LeimingMa,andJunpingZhang.Adeeplearning- basedmethodologyforprecipitationnowcastingwithradar. Earthand Space Science, 7(2):e2019EA000812, 2020

  49. [49]

    Cikmanalyticup2017:short-termprecip- itation forecasting based on radar reflectivity images

    YichenYaoandZhongjieLi. Cikmanalyticup2017:short-termprecip- itation forecasting based on radar reflectivity images. InProceedings of the Conference on Information and Knowledge Management, Short- Term Quantitative Precipitation Forecasting Challenge, Singapore, pages 6–10, 2017

  50. [50]

    Pysteps: An open-source python library for probabilistic precipitation nowcast- ing (v1

    Seppo Pulkkinen, Daniele Nerini, Andrés A Pérez Hortal, Carlos Velasco-Forero, Alan Seed, Urs Germann, and Loris Foresti. Pysteps: An open-source python library for probabilistic precipitation nowcast- ing (v1. 0).Geoscientific Model Development, 12(10):4185–4219, 2019

  51. [51]

    Learning spatiotemporal features with 3dcnn and convgru for video anomaly detection

    Xin Wang, Weixin Xie, and Jiayi Song. Learning spatiotemporal features with 3dcnn and convgru for video anomaly detection. In2018 14thIEEEinternationalconferenceonsignalprocessing(ICSP) ,pages 474–479. IEEE, 2018

  52. [52]

    Mau: A motion-aware unit for video prediction and beyond.Advances in Neural Information Processing Systems, 34:26950–26962, 2021

    Zheng Chang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Yan Ye, Xiang Xinguang, and Wen Gao. Mau: A motion-aware unit for video prediction and beyond.Advances in Neural Information Processing Systems, 34:26950–26962, 2021

  53. [53]

    Simvp: Simpler yet better video prediction

    Zhangyang Gao, Cheng Tan, Lirong Wu, and Stan Z Li. Simvp: Simpler yet better video prediction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3170– 3180, 2022

  54. [54]

    Earthfarsser:Versatilespatio-temporal dynamicalsystemsmodelinginonemodel

    Hao Wu, Yuxuan Liang, Wei Xiong, Zhengyang Zhou, Wei Huang, ShilongWang,andKunWang. Earthfarsser:Versatilespatio-temporal dynamicalsystemsmodelinginonemodel. In ProceedingsoftheAAAI conference on artificial intelligence, volume 38, pages 15906–15914, 2024

  55. [55]

    Disentangling physical dynamics from unknown factors for unsupervised video prediction

    Vincent Le Guen and Nicolas Thome. Disentangling physical dynamics from unknown factors for unsupervised video prediction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11474–11484, 2020

  56. [56]

    Alphapre: Amplitude- phase disentanglement model for precipitation nowcasting

    Kenghong Lin, Baoquan Zhang, Demin Yu, Wenzhi Feng, Shidong Chen, Feifan Gao, Xutao Li, and Yunming Ye. Alphapre: Amplitude- phase disentanglement model for precipitation nowcasting. InPro- ceedings of the Computer Vision and Pattern Recognition Conference, pages 17841–17850, 2025

  57. [57]

    Diffcast: A unified framework via residual diffusion for precipitation nowcasting

    DeminYu,XutaoLi,YunmingYe,BaoquanZhang,ChuyaoLuo,Kuai Dai, Rui Wang, and Xunlai Chen. Diffcast: A unified framework via residual diffusion for precipitation nowcasting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27758–27767, 2024. Wenjie Luo et al.:Preprint submitted to Elsevier Page 16 of 16