pith. machine review for the scientific record. sign in

arxiv: 2605.00367 · v1 · submitted 2026-05-01 · 💻 cs.CV

Recognition: unknown

Flow matching for Sentinel-2 super-resolution: implementation, application, and implications

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:50 UTC · model grok-4.3

classification 💻 cs.CV
keywords flow matchingsuper-resolutionSentinel-2satellite imageryland cover classificationgenerative modelingNAIP
0
0 comments X

The pith

A flow matching model achieves higher pixel accuracy than diffusion models for Sentinel-2 super-resolution in a single sampling step.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces a flow matching generative model to super-resolve Sentinel-2 10-meter imagery to 2.5 meters over the conterminous United States, trained on more than 120,000 same-day image pairs with resampled NAIP reference data. The model surpasses diffusion and Real-ESRGAN baselines in pixel-wise accuracy using only one Euler step and produces perceptually realistic outputs in 20 steps with a second-order solver, allowing users to adjust the perception-distortion balance at inference without retraining. The approach supports creation of a full-CONUS 2.5-meter 4-band product and raises land-cover classification accuracy to 89.11 percent on a Chesapeake Bay watershed test set. Readers would care because it offers a fast, practical route to finer spatial detail from widely available medium-resolution satellite data for mapping and monitoring tasks.

Core claim

The flow matching model outperforms diffusion and Real-ESRGAN models in pixel-wise accuracy for 4x super-resolution of Sentinel-2 visible and near-infrared bands in a single Euler sampling step; with a second-order Midpoint solver it generates perceptually realistic super-resolved imagery in only 20 sampling steps, enabling a 2.5-meter CONUS imagery product and a yearly 2.5-meter land-cover product for the Chesapeake Bay watershed that reaches 89.11 percent overall accuracy against ground-truth points.

What carries the argument

Flow matching generative model trained on paired same-day 10-m Sentinel-2 and 2.5-m resampled NAIP imagery for 4x super-resolution.

Load-bearing premise

Same-day Sentinel-2 and resampled NAIP pairs supply unbiased, perfectly aligned training targets across all land-cover types and atmospheric conditions with no significant domain shift when the model processes annual composites.

What would settle it

An independent test set of same-day Sentinel-2 and NAIP pairs from years or regions held out of training, evaluated with the same pixel-wise and perceptual metrics, where the flow matching model no longer exceeds the diffusion or Real-ESRGAN baselines.

Figures

Figures reproduced from arXiv: 2605.00367 by Dakota Hester, Juliana A. Ara\'ujo, Lucas B. Ferreira, Thainara M. A. Lima, Vitor S. Martins.

Figure 1
Figure 1. Figure 1: A high-level overview of the methodology used in this study. [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Spatial distribution of Sentinel-2 and NAIP image pairs used for training and evaluating [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Reclassified CBLC data (left), locations of triplets used for training land cover classifi [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of the sampling procedure for flow matching super-resolution using the Euler [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Pixel-wise accuracy (PSNR) and perceptual similarity (LPIPS) performance of the diffu [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of Euler and Midpoint solvers for flow matching-based super-resolution in [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visual comparison of super-resolved outputs from the flow matching model using the Euler [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: 10-m Sentinel-2 imagery, synthetic 2.5-m super-resolved imagery produced by the flow [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visual comparison of super-resolution outputs from a selection of the models evaluated [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Comparison of per-band regression plots for flow matching using the Euler solver with [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Examples of small-scale urban development visible in the synthetic 2.5-m imagery time [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: CONUS-wide synthetic 2.5-m imagery for 2025 generated using the flow matching super [PITH_FULL_IMAGE:figures/full_fig_p027_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Impact of increasing the number of sampling steps using iterative super-resolution meth [PITH_FULL_IMAGE:figures/full_fig_p028_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Per-class changes in accuracy metrics compared to Lanczos-upsampled Sentinel-2 imagery [PITH_FULL_IMAGE:figures/full_fig_p030_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Comparison of urban land cover classification outputs from SegFormer at 2.5-m using [PITH_FULL_IMAGE:figures/full_fig_p031_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Examples of high-frequency land cover changes captured by our annual Chesapeake Bay [PITH_FULL_IMAGE:figures/full_fig_p032_16.png] view at source ↗
read the original abstract

Developing robust techniques for super-resolution of satellite imagery involves navigating commonly observed trade-offs between spectral fidelity and perceptual quality. In this work, we introduce a flow matching model for 4x super-resolution of 10-m Sentinel-2 visible and near-infrared bands over the conterminous United States (CONUS) using a dataset of 120,851 10-m Sentinel-2 and 2.5-m resampled NAIP imagery pairs acquired on the same day. Our results showed that the flow matching model outperformed diffusion and Real-ESRGAN models in pixel-wise accuracy in a single sampling step using the Euler method. When evaluated with a second-order Midpoint solver, our model generated perceptually realistic super-resolved imagery in only 20 sampling steps, effectively navigating the perception-distortion trade-off at inference time without retraining. We used this model to produce a super-resolved 2.5-m 4-band CONUS imagery product derived from 2025 10-m Sentinel-2 annual composites, consisting of over 1.58 trillion pixels. We further evaluated the use of super-resolved data on a land cover classification task using semantic segmentation models. Finally, we generated a yearly 2.5-m land cover product for the Chesapeake Bay watershed for 2020-2025. An accuracy assessment against 25,000 ground truth points revealed an overall accuracy of 89.11% for the annual land cover product. We conclude that flow matching is an effective generative modeling approach for super-resolution of Sentinel-2 imagery compared to diffusion and Generative Adversarial Network-based methods, and has strong implications for expanding access to high-resolution imagery for geospatial applications that demand fine spatial detail.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces a flow matching model for 4x super-resolution of Sentinel-2 visible and near-infrared bands, trained on 120,851 same-day paired 10 m Sentinel-2 and 2.5 m resampled NAIP images over CONUS. It reports that the model outperforms diffusion and Real-ESRGAN baselines in pixel-wise accuracy using a single Euler sampling step and produces perceptually realistic outputs with a second-order Midpoint solver in only 20 steps. The trained model is applied to generate a super-resolved 2.5 m 4-band CONUS product from 2025 annual composites (over 1.58 trillion pixels) and is evaluated on a downstream land-cover classification task, yielding an overall accuracy of 89.11% against 25,000 ground-truth points in the Chesapeake Bay watershed for 2020-2025.

Significance. If the performance claims and generalization hold, the work offers an efficient generative approach to satellite super-resolution that allows inference-time control over the perception-distortion trade-off without retraining. The sampling efficiency (single-step accuracy and 20-step perceptual quality) relative to diffusion models is a concrete strength, and the scale of the CONUS product plus the land-cover application demonstrate practical downstream utility for geospatial tasks requiring fine spatial detail.

major comments (3)
  1. [Abstract] Abstract: the headline claim that the flow-matching model 'outperformed diffusion and Real-ESRGAN models in pixel-wise accuracy' rests on single reported numbers with no error bars, no ablation details, and no statistical tests; this is load-bearing for the central effectiveness conclusion.
  2. [Abstract] Abstract: the CONUS-scale product and 89.11% land-cover accuracy are obtained by applying the model to 2025 annual composites, yet the training data consist exclusively of same-day Sentinel-2/NAIP pairs; no quantitative validation (histogram matching, perceptual metrics on composite-like inputs, or held-out composite test set) is supplied to confirm that the learned transport map remains valid under the altered reflectance and textural statistics of annual composites.
  3. [Abstract] Abstract / Results: the manuscript states that the model 'effectively navigat[es] the perception-distortion trade-off at inference time' but provides no explicit spectral-fidelity metrics (e.g., SAM, ERGAS) alongside the perceptual or pixel-wise numbers, leaving the claimed balance unquantified.
minor comments (1)
  1. [Abstract] The dataset description gives the total number of pairs (120,851) but does not specify the train/validation/test split ratios or the geographic stratification used to ensure representativeness across CONUS land-cover types.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the detailed and constructive referee report. We appreciate the focus on strengthening the central claims and addressing potential limitations in generalization and metric completeness. We respond to each major comment below and outline revisions to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claim that the flow-matching model 'outperformed diffusion and Real-ESRGAN models in pixel-wise accuracy' rests on single reported numbers with no error bars, no ablation details, and no statistical tests; this is load-bearing for the central effectiveness conclusion.

    Authors: The abstract condenses the primary quantitative results, which are supported by detailed tables in the Results section comparing multiple metrics across sampling methods and baselines. To address the concern, we will revise the manuscript to include error bars (standard deviation across multiple random seeds or cross-validation folds) and statistical significance tests (e.g., paired t-tests or Wilcoxon tests) for the key pixel-wise accuracy comparisons. We will also add explicit cross-references to the ablation studies already present in the supplementary material. revision: yes

  2. Referee: [Abstract] Abstract: the CONUS-scale product and 89.11% land-cover accuracy are obtained by applying the model to 2025 annual composites, yet the training data consist exclusively of same-day Sentinel-2/NAIP pairs; no quantitative validation (histogram matching, perceptual metrics on composite-like inputs, or held-out composite test set) is supplied to confirm that the learned transport map remains valid under the altered reflectance and textural statistics of annual composites.

    Authors: This highlights a legitimate domain-shift consideration. The training pairs cover diverse CONUS conditions and seasons, and annual composites are constructed to approximate average reflectance; however, we did not supply explicit validation metrics on composite-style inputs in the original submission. In revision we will add a targeted analysis section that includes histogram comparisons and selected perceptual metrics evaluated on a small set of held-out annual composites where ground-truth high-resolution data can be obtained, or we will explicitly qualify the assumption and its potential limitations if full new experiments prove infeasible. revision: partial

  3. Referee: [Abstract] Abstract / Results: the manuscript states that the model 'effectively navigat[es] the perception-distortion trade-off at inference time' but provides no explicit spectral-fidelity metrics (e.g., SAM, ERGAS) alongside the perceptual or pixel-wise numbers, leaving the claimed balance unquantified.

    Authors: We agree that spectral-fidelity metrics would make the perception-distortion trade-off claim more complete. We will incorporate standard remote-sensing metrics such as Spectral Angle Mapper (SAM) and ERGAS into the main results tables for both single-step Euler and multi-step Midpoint sampling regimes, allowing direct comparison of spectral preservation with the existing pixel-wise and perceptual scores. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results rest on held-out evaluation and external ground truth

full rationale

The paper trains a flow-matching model on same-day Sentinel-2/NAIP pairs and reports pixel-wise accuracy, perceptual quality, and downstream land-cover accuracy via direct comparison against held-out test imagery and 25,000 independent ground-truth points. No equation, fitted parameter, or self-citation is shown to define the reported metrics by construction; the performance numbers and 89.11 % accuracy are measured quantities, not tautological outputs of the training procedure itself. The extension to annual composites is presented as an application rather than a derived claim that loops back to the training distribution.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The central empirical claims rest on a trained neural network whose weights are free parameters learned from the paired dataset; no additional ad-hoc constants or invented physical entities are introduced beyond standard generative-model training assumptions.

free parameters (1)
  • neural network weights and training hyperparameters
    Learned during optimization on the 120k image pairs; exact values and selection procedure not stated in abstract.

pith-pipeline@v0.9.0 · 5628 in / 1322 out tokens · 50982 ms · 2026-05-09T19:50:41.479401+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 54 canonical work pages · 5 internal anchors

  1. [1]

    Science of The Total Environment 1015, 181451

    National-scale open cattle feedlot detection using deep learning and high-resolution aerial images: Spatial distribution and animal welfare analysis. Science of The Total Environment 1015, 181451. doi:10.1016/j.scitotenv.2026.181451. Arjovsky, M., Chintala, S., Bottou, L.,

  2. [2]

    Remote Sensing of Environment 334, 115222

    A radiometrically and spatially consistent super-resolution framework for Sentinel-2. Remote Sensing of Environment 334, 115222. doi:10.1016/j.rse.2025.115222. Aybar, C., Montero, D., Contreras, J., Donike, S., Kalaitzis, F., Gómez-Chova, L., 2024a. SEN2NAIP: A large-scale dataset for Sentinel-2 Image Super-Resolution. Sci. Data

  3. [3]

    Aybar, C., Montero, D., Donike, S., Kalaitzis, F., Gómez-Chova, L., 2024b

    doi:10.1038/s41597-024-04214-y. Aybar, C., Montero, D., Donike, S., Kalaitzis, F., Gómez-Chova, L., 2024b. A Comprehensive Benchmark for Optical Remote Sensing Image Super-Resolution. IEEE Geosci. Remote Sens. Lett. 21, 1–5. doi:10.1109/LGRS.2024.3401394. Bhatt, P., Maclean, A.L.,

  4. [4]

    GIScience Remote Sens

    Comparison of high-resolution NAIP and unmanned aerial ve- hicle (UAV) imagery for natural vegetation communities classification using machine learning approaches. GIScience Remote Sens. 60, 2177448. doi:10.1080/15481603.2023.2177448. Blau, Y., Michaeli, T.,

  5. [5]

    The Perception-Distortion Tradeoff, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society. pp. 6228–6237. doi:10.1109/CVPR.2018.00652. Blickensdörfer, L., Schwieder, M., Pflugmacher, D., Nendel, C., Erasmi, S., Hostert, P.,

  6. [6]

    doi:10.1016/j.rse.2021.112831. Brown, C.F., Brumby, S.P., Guzder-Williams, B., Birch, T., Hyde, S.B., Mazzariello, J., Czerwinski, W., Pasquarella, V.J., Haertel, R., Ilyushchenko, S., Schwehr, K., Weisse, M., Stolle, F., Hanson, C., Guinan, O., Moore, R., Tait, A.M.,

  7. [7]

    Chen, B., Liu, L., Liu, C., Zou, Z., Shi, Z.,

    doi:10.1038/s41597-022-01307-4. Chen, B., Liu, L., Liu, C., Zou, Z., Shi, Z.,

  8. [8]

    IEEE Trans

    Spectral-Cascaded Diffusion Model for Re- mote Sensing Image Spectral Super-Resolution. IEEE Trans. Geosci. Remote Sens. 62, 1–14. doi:10.1109/TGRS.2024.3450874. Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.,

  9. [9]

    sustainable

    Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.arXiv:1802.02611. Claggett, P.R., McDonald, S.M., O’Neil-Dunne, J., MacFaden, S., Walker, K., Guinn, S., Ahmed, L., Buford, E., Kurtz, E., McCabe, P., Pickford, J.A., Royar, A., Schulze, K.,

  10. [11]

    Super-resolution GANs for upscaling unplanned urban settlements from remote sensing satellite imagery – the case of Chinese urban village detection. Int. J. Digit. Earth 16, 2623–2643. doi:10.1080/17538947.2023.2230956. Croitoru, F.A., Hondru, V., Ionescu, R.T., Shah, M.,

  11. [12]

    Croitoru, V

    Diffusion Models in Vision: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 45, 10850–10869. doi:10.1109/TPAMI.2023.3261988. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.,

  12. [13]

    1980 , issn =

    A family of embedded Runge-Kutta formulae. Journal of Computa- tional and Applied Mathematics 6, 19–26. doi:10.1016/0771-050X(80)90013-3. 35 Drusch, M., Del Bello, U., Carlier, S., Colin, O., Fernandez, V., Gascon, F., Hoersch, B., Isola, C., Laberinti, P., Martimort, P., Meygret, A., Spoto, F., Sy, O., Marchese, F., Bargellini, P.,

  13. [14]

    Drusch, U

    Sentinel-2: ESA’s Optical High-Resolution Mission for GMES Operational Services. Remote Sensing of Environment 120, 25–36. doi:10.1016/j.rse.2011.11.026. Duchon, C.E.,

  14. [15]

    Lanczos Filtering in One and Two Dimensions. J. Appl. Meteorol. Climatol. 18, 1016–1022. doi:10.1175/1520-0450(1979)018<1016:LFIOAT>2.0.CO;2. Earth Resources Observation and Science (EROS) Center,

  15. [16]

    doi:10.5066/F7QN651G

    National Agriculture Imagery Program (NAIP). doi:10.5066/F7QN651G. Gao, S., Liu, X., Zeng, B., Xu, S., Li, Y., Luo, X., Liu, J., Zhen, X., Zhang, B.,

  16. [17]

    Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.,

    doi:10.3390/rs13061104. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.,

  17. [18]

    2017 , note =

    Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment 202, 18–27. doi:10.1016/j.rse.2017.06.031. He, K., Zhang, X., Ren, S., Sun, J.,

  18. [19]

    Deep Residual Learning for Image Recognition

    Deep Residual Learning for Image Recognition. arXiv:1512.03385. Ho, J., Jain, A., Abbeel, P.,

  19. [20]

    Denoising Diffusion Probabilistic Models

    Denoising Diffusion Probabilistic Models.arXiv:2006.11239. Jia, M., Wang, Z., Mao, D., Ren, C., Song, K., Zhao, C., Wang, C., Xiao, X., Wang, Y.,

  20. [21]

    Mapping global distribution of mangrove forests at 10-m resolution. Sci. Bull. 68, 1306–1316. doi:10.1016/j.scib.2023.05.004. Jia, S., Wang, Z., Li, Q., Jia, X., Xu, M.,

  21. [22]

    IEEE Trans

    Multiattention Generative Adversarial Net- work for Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 60, 1–15. doi:10.1109/TGRS.2022.3180068. Krizhevsky, A., Sutskever, I., Hinton, G.,

  22. [23]

    Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, in: 2017 IEEE Conf. Comput. Vis. Pattern Recognit. CVPR, pp. 105–114. doi:10.1109/CVPR.2017.19. Li, H., Yang, Y., Chang, M., Chen, S., Feng, H., Xu, Z., Li, Q., Chen, Y.,

  23. [24]

    Neurocomputing 479, 47–59

    SRDiff: Sin- gle image super-resolution with diffusion probabilistic models. Neurocomputing 479, 47–59. doi:10.1016/j.neucom.2022.01.029. 36 Liang, J., Zeng, H., Zhang, L.,

  24. [25]

    Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution, in: 2022 IEEECVF Conf. Comput. Vis. Pattern Recognit. CVPR, pp. 5647–5656. doi:10.1109/CVPR52688.2022.00557. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.,

  25. [26]

    IEEE Transactions on Pattern Analysis and Machine Intelligence 42(2), 318–327 (2020) https://doi.org/10.1109/TPAMI.2018.2858826

    Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 42, 318–327. doi:10.1109/TPAMI.2018.2858826. Lipman, Y., Chen, R.T.Q., Ben-Hamu, H., Nickel, M., Le, M.,

  26. [27]

    Flow Matching for Generative Modeling

    Flow Matching for Generative Modeling. doi:10.48550/arXiv.2210.02747,arXiv:2210.02747. Liu, Y., Yue, J., Xia, S., Ghamisi, P., Xie, W., Fang, L.,

  27. [28]

    IEEE Trans

    Diffusion Models Meet Remote Sensing: Principles, Methods, and Perspectives. IEEE Trans. Geosci. Remote Sens. 62, 1–22. doi:10.1109/TGRS.2024.3464685. Main-Knorn, M., Pflug, B., Louis, J., Debaecker, V., Müller-Wilm, U., Gascon, F.,

  28. [29]

    Remote Sens

    Sen2Cor for Sentinel-2, in: Image Signal Process. Remote Sens. XXIII, SPIE. pp. 37–48. doi:10.1117/12.2278218. Martins, V.S., Kaleita, A.L., Gelder, B.K.,

  29. [30]

    croplands: Implementation and preliminary analysis

    Digital mapping of structural conservation prac- tices in the Midwest U.S. croplands: Implementation and preliminary analysis. Science of The Total Environment 772, 145191. doi:10.1016/j.scitotenv.2021.145191. Martins, V.S., Kaleita, A.L., Gelder, B.K., Da Silveira, H.L., Abe, C.A.,

  30. [31]

    ISPRS Journal of Photogrammetry and Remote Sensing 168, 56–73

    Exploring multiscale object-based convolutional neural network (multi-OCNN) for remote sensing image classification at high spatial resolution. ISPRS Journal of Photogrammetry and Remote Sensing 168, 56–73. doi:10.1016/j.isprsjprs.2020.08.004. Maxwell, A.E., Warner, T.A., Vanderbilt, B.C., Ramezan, C.A.,

  31. [32]

    Photogrammetric Engineering & Remote Sensing 83, 737–747

    Land Cover Clas- sification and Feature Extraction from National Agriculture Imagery Program (NAIP) Or- thoimagery: A Review. Photogrammetric Engineering & Remote Sensing 83, 737–747. doi:10.14358/PERS.83.10.737. McDonald, S.M., Pickford, J.A., Claggett, P.R., Ahmed, L.,

  32. [33]

    Chesapeake Bay Land Use/Land Cover (LULC) Database 2024 Edition User Guide. U.S. Geological Survey data release. doi:10.5066/P14BEBRC. Meng, F., Chen, Y., Jing, H., Zhang, L., Yan, Y., Ren, Y., Wu, S., Feng, T., Liu, R., Du, Z.,

  33. [34]

    IEEE Trans

    A Conditional Diffusion Model With Fast Sampling Strategy for Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 62, 1–16. doi:10.1109/TGRS.2024.3458009. Miao, R., Yang, K., Zhou, K., Song, J., Fu, S., Liu, C., Wang, Y.,

  34. [35]

    Research on Cross-Sensor Remote Sensing Image Super-Resolution Method Based on Diffusion Models. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 18, 18528–18542. doi:10.1109/JSTARS.2025.3590687. Microsoft Open Source, McFarland, M., Emanuele, R., Morris, D., Augspurger, T.,

  35. [36]

    doi:10.5281/zenodo.7261897

    Zenodo. doi:10.5281/zenodo.7261897. Odena, A., Dumoulin, V., Olah, C.,

  36. [37]

    Deconvolution and Checkerboard Artifacts,

    Deconvolution and Checkerboard Artifacts. Distill 1, e3. doi:10.23915/distill.00003. Pasquarella, V.J., Brown, C.F., Czerwinski, W., Rucklidge, W.J.,

  37. [38]

    Comprehensive quality assessment of optical satellite imagery using weakly supervised video learning, in: 37 2023 IEEECVF Conf. Comput. Vis. Pattern Recognit. Workshop CVPRW, pp. 2125–2135. doi:10.1109/CVPRW59228.2023.00206. Pham, V.D., Bui, Q.T.,

  38. [39]

    Remote Sens

    Spatial resolution enhancement method for Landsat imagery using a Generative Adversarial Network. Remote Sens. Lett. 12, 654–665. doi:10.1080/2150704X.2021.1918789. Pooladian, A.A., Ben-Hamu, H., Domingo-Enrich, C., Amos, B., Lipman, Y., Chen, R.T.Q.,

  39. [40]

    U-Net: Convolutional Networks for Biomedical Image Segmentation

    U-Net: Convolutional Networks for Biomedical Image Segmentation. doi:10.48550/ARXIV.1505.04597. Ross, D.A., Lim, J., Lin, R.S., Yang, M.H.,

  40. [41]

    Int J Comput Vis 77, 125–141

    Incremental Learning for Robust Visual Tracking. Int J Comput Vis 77, 125–141. doi:10.1007/s11263-007-0075-7. Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J., Norouzi, M.,

  41. [42]

    Fleet, and Mohammad Norouzi

    Image Super-Resolution via Iterative Refinement. doi:10.48550/arXiv.2104.07636,arXiv:2104.07636. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.,

  42. [43]

    Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior, in: 2011 31st Int. Conf. Distrib. Comput. Syst. Workshop, pp. 166–171. doi:10.1109/ICDCSW.2011.20. Schonfeld, E., Schiele, B., Khoreva, A.,

  43. [44]

    Denoising Diffusion Implicit Models

    Denoising Diffusion Implicit Models.arXiv:2010.02502. Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.,

  44. [45]

    Score-Based Generative Modeling through Stochastic Differential Equations., in: 9th Int. Conf. Learn. Repre- sent. ICLR 2021 Virtual Event Austria May 3-7

  45. [46]

    ISPRS Journal of Photogrammetry and Remote Sensing 220, 125–138

    Semantic guided large scale factor remote sensing image super-resolution with generative diffusion prior. ISPRS Journal of Photogrammetry and Remote Sensing 220, 125–138. doi:10.1016/j.isprsjprs.2024.12.001. Wang, S., Han, B., Yang, L., Zhao, C., Liang, A., Hu, C., Yang, F., Xu, F.,

  46. [47]

    IEEE Trans

    Robust Remote Sensing Super-Resolution With Frequency Domain Decoupling for Multiscenarios. IEEE Trans. Geosci. Remote Sens. 62, 1–13. doi:10.1109/TGRS.2024.3406516. 38 Wang, X., Xie, L., Dong, C., Shan, Y.,

  47. [48]

    doi:10.48550/arXiv.2107.10833,arXiv:2107.10833

    Real-ESRGAN: Training Real-World Blind Super- Resolution with Pure Synthetic Data. doi:10.48550/arXiv.2107.10833,arXiv:2107.10833. Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Loy, C.C., Qiao, Y., Tang, X.,

  48. [49]

    ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks,

    ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. doi:10.48550/arXiv.1809.00219, arXiv:1809.00219. Wang, Y., Bashir, S.M.A., Khan, M., Ullah, Q., Wang, R., Song, Y., Guo, Z., Niu, Y.,

  49. [50]

    Expert Systems with Applications 197, 116793

    Remote sensing image super-resolution and object detection: Benchmark and state of the art. Expert Systems with Applications 197, 116793. doi:10.1016/j.eswa.2022.116793. Wang, Y., Shao, Z., Lu, T., Huang, X., Wang, J., Zhang, Z., Zuo, X.,

  50. [51]

    Pattern Recognition 160, 111178

    Lightweight remote sensing super-resolution with multi-scale graph attention network. Pattern Recognition 160, 111178. doi:10.1016/j.patcog.2024.111178. Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.,

  51. [52]

    Image quality assessment: from error visibility to structural similarity

    Imagequalityassessment: Fromerrorvisibility to structural similarity. IEEE Trans. Image Process. 13, 600–612. doi:10.1109/TIP.2003.819861. Wolters, P., Bastani, F., Kembhavi, A.,

  52. [53]

    Xiao, Y., Yuan, Q., Jiang, K., He, J., Jin, X., Zhang, L.,

    Zooming Out on Zooming In: Advancing Super- Resolution for Remote Sensing.arXiv:2311.18082. Xiao, Y., Yuan, Q., Jiang, K., He, J., Jin, X., Zhang, L.,

  53. [54]

    IEEE Trans

    EDiffSR: An Efficient Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 62, 1–14. doi:10.1109/TGRS.2023.3341437. Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.,

  54. [55]

    IEEE Trans

    Loss Functions for Image Restoration With Neural Networks. IEEE Trans. Comput. Imaging 3, 47–57. doi:10.1109/TCI.2016.2644865. Zhu, C., Deng, S., Zhou, Y., Deng, L.J., Wu, Q.,

  55. [56]

    IEEE Trans

    QIS-GAN: A Lightweight Adversarial Network With Quadtree Implicit Sampling for Multispectral and Hyperspectral Image Fusion. IEEE Trans. Geosci. Remote Sens. 61, 1–15. doi:10.1109/TGRS.2023.3332176. 39 Supplementary material Table S1: Reclassification scheme mapping original CBLC classes to generalized land cover classes. Original CBLC 2024 Class Generali...