pith. sign in

arxiv: 2505.18991 · v3 · pith:Y7GV6ZASnew · submitted 2025-05-25 · 💻 cs.CV

Fast Kernel-Space Diffusion for Remote Sensing Pansharpening

Pith reviewed 2026-05-22 01:18 UTC · model grok-4.3

classification 💻 cs.CV
keywords pansharpeningdiffusion modelsremote sensingkernel generationlow-rank tensorsmulti-head attentionimage fusionfast inference
0
0 comments X

The pith

KSDiff shifts diffusion to kernel space to fuse satellite images with over 500 times faster inference and better quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Pansharpening fuses high-resolution panchromatic images with low-resolution multispectral data to produce outputs rich in both spatial detail and spectral information. Existing deep learning methods often miss the broad statistical patterns in remote sensing scenes, while diffusion models that could capture those patterns run too slowly for real use. KSDiff moves the diffusion process into the generation of convolutional kernels that already embed global context from the data. These kernels are built by combining a low-rank core tensor generator with a unified factor generator under the direction of structure-aware multi-head attention. A two-stage training procedure lets the module drop into existing pansharpening networks, delivering higher-quality results at more than 500 times the speed of prior diffusion baselines.

Core claim

KSDiff constructs convolutional kernels enriched with global context through the integration of a low-rank core tensor generator and a unified factor generator, orchestrated by a structure-aware multi-head attention mechanism. This kernel-space diffusion approach, supported by a two-stage training strategy, allows integration into standard pansharpening pipelines while capturing global priors inherent in remote sensing data distributions.

What carries the argument

low-rank core tensor generator and unified factor generator orchestrated by structure-aware multi-head attention to produce global-context-enriched convolutional kernels

If this is right

  • Pansharpening outputs achieve higher quality than recent competing methods on standard evaluation metrics.
  • Inference runs more than 500 times faster than existing diffusion-based pansharpening models.
  • The module integrates into existing pansharpening architectures through a two-stage training procedure.
  • Global priors from remote sensing distributions are captured without direct pixel-space diffusion.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The kernel-generation strategy could be tested on other remote-sensing fusion tasks such as hyperspectral sharpening.
  • Real-time processing pipelines for satellite imagery streams may become feasible with this level of acceleration.
  • Tensor-factorization patterns from the method might transfer to efficiency improvements in related enhancement models.

Load-bearing premise

Integrating a low-rank core tensor generator and unified factor generator with structure-aware multi-head attention will reliably capture global priors in remote sensing data distributions while delivering the claimed inference speedup without quality loss.

What would settle it

Direct timing and quality measurements on standard remote sensing benchmark datasets that show less than 500-fold inference speedup or lower pansharpening metrics than diffusion baselines would falsify the central performance claims.

Figures

Figures reproduced from arXiv: 2505.18991 by Hancong Jin, Jingjing Li, Liang-Jian Deng, Zihan Cao.

Figure 1
Figure 1. Figure 1: (a) Traditional DL-based methods directly learn a non [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Kernel Generator of our proposed KSDiff. The kernel generator comprises two sub-modules: (1) a diffusion model-driven [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Pyramid Latent Fusion Encoder (PLFE). The figure [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: An overview of our two-stage training procedure and inference process. (a) [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of qualitative results for representative methods on the GF2 reduced-resolution dataset. The first row displays RGB [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of latent representations generated by KSDiff from PAN and LRMS images across different scenes. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Baseline network. the main text. All core components, including the cross￾attention mechanism and the Fusion-Gate module, remain identical to those in PLFE1. The role of PLFE2 differs from PLFE1. In PLFE1, the module encodes the ground￾truth , PAN, LRMS images to obtain a latent representation z0 ∈ R N×Cz . In contrast, PLFE2 produces a conditioning vector c ∈ R N×Cz only with PAN and LRMS images as inputs… view at source ↗
Figure 8
Figure 8. Figure 8: (a) PLFE2. (b) The cross attention in latent encoders. 7. Details on Experiments 7.1. Datasets We conducted experiments using datasets derived from WorldView-3 (WV3), QuickBird (QB), and GaoFen-2 (GF2) satellite imagery. These datasets consist of image patches cropped from full remote sensing scenes and are partitioned into training and testing subsets. The WV3 dataset contains four images from two geograp… view at source ↗
Figure 9
Figure 9. Figure 9: Comparison of qualitative results for representative methods on the WV3 reduced-resolution dataset. [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Comparison of qualitative results for representative methods on the WV3 full-resolution dataset. [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of qualitative results for representative methods on the GF2 reduced-resolution dataset. [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Comparison of qualitative results for representative methods on the GF2 full-resolution dataset. [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Comparison of qualitative results for representative methods on the QB reduced-resolution dataset. [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Comparison of qualitative results for representative methods on the QB full-resolution dataset. [PITH_FULL_IMAGE:figures/full_fig_p018_14.png] view at source ↗
read the original abstract

Pansharpening seeks to fuse high-resolution panchromatic (PAN) and low-resolution multispectral (LRMS) images into a single image with both fine spatial and rich spectral detail. Despite progress in deep learning-based approaches, existing methods often fail to capture global priors inherent in remote sensing data distributions. Diffusion-based models have recently emerged as promising solutions due to their powerful distribution mapping capabilities, however, they suffer from heavy inference latency. We introduce KSDiff, a fast kernel-space diffusion framework that generates convolutional kernels enriched with global context to enhance pansharpening quality and accelerate inference. Specifically, KSDiff constructs these kernels through the integration of a low-rank core tensor generator and a unified factor generator, orchestrated by a structure-aware multi-head attention mechanism. We further introduce a two-stage training strategy tailored for pansharpening, facilitating integration into existing pansharpening architectures. Experiments show that KSDiff achieves superior performance compared to recent promising methods, and with over $500 \times$ faster inference than diffusion-based pansharpening baselines. Ablation studies, visualizations and further evaluations substantiate the effectiveness of our approach. Code will be released upon possible acceptance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces KSDiff, a kernel-space diffusion framework for remote sensing pansharpening. It generates convolutional kernels enriched with global context via a low-rank core tensor generator integrated with a unified factor generator, orchestrated by structure-aware multi-head attention. A two-stage training strategy allows integration into existing pansharpening architectures. Experiments claim superior performance over recent methods together with over 500× faster inference than diffusion-based pansharpening baselines, supported by ablations and visualizations.

Significance. If the performance and speedup claims are substantiated with full quantitative evidence, the work would be significant for the field. It directly tackles the inference latency barrier that has limited diffusion models in remote-sensing applications, while proposing a kernel-generation approach to capture global priors. The two-stage training and plug-in design are practical strengths that could facilitate adoption.

major comments (2)
  1. [Method (low-rank core tensor generator and unified factor generator)] The low-rank core tensor generator is load-bearing for both the claimed global-prior capture and the 500× speedup. No derivation, approximation bound, or rank-selection analysis is provided showing that the chosen rank preserves the long-range spectral-spatial correlations required for accurate distribution mapping in remote-sensing data; if critical dependencies are discarded, the superior-performance claim would not hold even if runtime improves.
  2. [Experiments and results] The abstract states superior performance and 500× faster inference, yet the provided description contains no quantitative tables, specific metrics (PSNR/SSIM), datasets, error bars, or statistical tests. Full experimental results with baseline comparisons and ablation tables are required to verify the central claims.
minor comments (1)
  1. [Abstract and training strategy] Ensure all hyperparameters for the two-stage training and attention mechanism are explicitly listed for reproducibility; the promise to release code is noted positively.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We appreciate the recognition of the practical strengths of the two-stage training and plug-in design, as well as the potential significance if the performance and speedup claims are fully substantiated. We address each major comment below and outline the revisions planned for the next version of the manuscript.

read point-by-point responses
  1. Referee: [Method (low-rank core tensor generator and unified factor generator)] The low-rank core tensor generator is load-bearing for both the claimed global-prior capture and the 500× speedup. No derivation, approximation bound, or rank-selection analysis is provided showing that the chosen rank preserves the long-range spectral-spatial correlations required for accurate distribution mapping in remote-sensing data; if critical dependencies are discarded, the superior-performance claim would not hold even if runtime improves.

    Authors: We acknowledge that the manuscript currently lacks a formal derivation, approximation bound, or explicit rank-selection analysis for the low-rank core tensor generator. The design choices were guided by empirical ablations showing that the selected rank maintains competitive performance, but we agree this is insufficient to rigorously demonstrate preservation of long-range spectral-spatial correlations. In the revised manuscript, we will add a new subsection under the method description that includes (i) a brief tensor-decomposition perspective on why the chosen rank is expected to retain key global priors and (ii) additional quantitative analysis (e.g., correlation-preservation metrics and performance sensitivity curves across ranks) to support the claim that critical dependencies are not discarded. revision: yes

  2. Referee: [Experiments and results] The abstract states superior performance and 500× faster inference, yet the provided description contains no quantitative tables, specific metrics (PSNR/SSIM), datasets, error bars, or statistical tests. Full experimental results with baseline comparisons and ablation tables are required to verify the central claims.

    Authors: The full manuscript contains an Experiments section with quantitative tables reporting PSNR, SSIM, SAM, and ERGAS on standard remote-sensing datasets (e.g., WorldView-3, GaoFen-2), direct comparisons against recent pansharpening and diffusion baselines, and ablation studies. However, we recognize that the presentation may not have been sufficiently prominent or complete for verification. In the revision we will (i) expand the main results table to include error bars from multiple random seeds, (ii) add a statistical significance analysis (paired t-tests or Wilcoxon tests) for the reported improvements, and (iii) ensure every claim in the abstract is explicitly cross-referenced to the corresponding table or figure. We will also move key ablation results into the main paper if they were previously in the supplement. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on experiments, not self-referential definitions or fits

full rationale

The paper presents KSDiff as an architectural proposal that integrates a low-rank core tensor generator, unified factor generator, and structure-aware multi-head attention to produce convolutional kernels for kernel-space diffusion. Performance claims (superior results and >500× speedup) are explicitly tied to experimental validation, ablation studies, and comparisons against baselines rather than any derivation that reduces by construction to fitted parameters, self-citations, or renamed inputs. No equations or steps in the described method equate a 'prediction' to its own training data or invoke a uniqueness theorem from the authors' prior work as load-bearing justification. The two-stage training strategy is framed as an integration aid, not a circular prediction. This is a standard empirical ML contribution whose central assertions remain independently testable against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no specific free parameters, axioms, or invented entities beyond the high-level method components can be identified; the two-stage training and kernel generators are presented as methodological choices.

pith-pipeline@v0.9.0 · 5742 in / 1134 out tokens · 53728 ms · 2026-05-22T01:18:14.308407+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

76 extracted references · 76 canonical work pages · 8 internal anchors

  1. [1]

    Mtf-tailored multiscale fusion of high-resolution ms and pan imagery.Photogrammetric Engineering & Remote Sensing, 72(5):591–596, 2006

    Bruno Aiazzi, Luciano Alparone, Stefano Baronti, Andrea Garzelli, and Massimo Selva. Mtf-tailored multiscale fusion of high-resolution ms and pan imagery.Photogrammetric Engineering & Remote Sensing, 72(5):591–596, 2006. 7, 8, 2, 3, 4

  2. [2]

    Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

    Michael S Albergo, Nicholas M Boffi, and Eric Vanden- Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions.arXiv preprint arXiv:2303.08797, 2023. 3

  3. [3]

    Full-resolution quality Table 2

    Alberto Arienzo, Gemine Vivone, Andrea Garzelli, Luciano Alparone, and Jocelyn Chanussot. Full-resolution quality Table 2. Result on the GF2 reduced-resolution dataset. The best results are highlighted in bold and the second best results are under- lined. Method GaoFen-2SAM (±std) ERGAS (±std) Q2n(±std) SCC (±std) BDSD-PC [51]1.7110±0.0718 1.7025±0.0907 0...

  4. [4]

    Automating spectral unmixing of aviris data using convex geometry concepts

    Joseph W Boardman. Automating spectral unmixing of aviris data using convex geometry concepts. InJPL, Summaries of the 4th Annual JPL Airborne Geoscience Workshop. Volume 1: AVIRIS Workshop, 1993. 6

  5. [5]

    Diffusion model with disentangled modulations for sharpening multispectral and hyperspectral images.Information Fusion, 104:102158, 2024

    Zihan Cao, Shiqi Cao, Liang-Jian Deng, Xiao Wu, Junming Hou, and Gemine Vivone. Diffusion model with disentangled modulations for sharpening multispectral and hyperspectral images.Information Fusion, 104:102158, 2024. 2, 3, 5, 6

  6. [6]

    Diffusion Posterior Sampling for General Noisy Inverse Problems

    Hyungjin Chung, Jeongsol Kim, Michael T Mccann, Marc L Klasky, and Jong Chul Ye. Diffusion posterior sam- pling for general noisy inverse problems.arXiv preprint arXiv:2209.14687, 2022. 3

  7. [7]

    Diffusion schr ¨odinger bridge with applications to score-based generative modeling.Advances in Neural Information Processing Systems, 34:17695–17709, 2021

    Valentin De Bortoli, James Thornton, Jeremy Heng, and Ar- naud Doucet. Diffusion schr ¨odinger bridge with applications to score-based generative modeling.Advances in Neural Information Processing Systems, 34:17695–17709, 2021. 3

  8. [8]

    Detail injection-based deep convolutional neural networks for pansharpening.IEEE Transactions on Geo- science and Remote Sensing, 59(8):6995–7010, 2020

    Liang-Jian Deng, Gemine Vivone, Cheng Jin, and Jocelyn Chanussot. Detail injection-based deep convolutional neural networks for pansharpening.IEEE Transactions on Geo- science and Remote Sensing, 59(8):6995–7010, 2020. 2, 6, 7, 8, 9, 3, 4

  9. [9]

    Machine learning in pansharpening: A benchmark, from shallow to deep networks.IEEE Geo- science and Remote Sensing Magazine, 10(3):279–315, 2022

    Liang-Jian Deng, Gemine Vivone, Mercedes E Paoletti, Giuseppe Scarpa, Jiang He, Yongjun Zhang, Jocelyn Chanus- sot, and Antonio Plaza. Machine learning in pansharpening: A benchmark, from shallow to deep networks.IEEE Geo- science and Remote Sensing Magazine, 10(3):279–315, 2022. 6, 2

  10. [10]

    Content-adaptive non-local convolution for remote sensing pansharpening

    Yule Duan, Xiao Wu, Haoyu Deng, and Liang-Jian Deng. Content-adaptive non-local convolution for remote sensing pansharpening. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27738– 27747, 2024. 2

  11. [11]

    Scaling rectified flow trans- formers for high-resolution image synthesis

    Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim En- tezari, Jonas M¨uller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow trans- formers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024. 3

  12. [12]

    Hypercomplex quality assessment of multi/hyperspectral images.IEEE Geoscience and Remote Sensing Letters, 6(4):662–665, 2009

    Andrea Garzelli and Filippo Nencini. Hypercomplex quality assessment of multi/hyperspectral images.IEEE Geoscience and Remote Sensing Letters, 6(4):662–665, 2009. 6

  13. [13]

    Hqg-net: Unpaired medical image enhancement with high-quality guid- ance.IEEE Transactions on Neural Networks and Learning Systems, 2023

    Chunming He, Kai Li, Guoxia Xu, Jiangpeng Yan, Longxi- ang Tang, Yulun Zhang, Yaowei Wang, and Xiu Li. Hqg-net: Unpaired medical image enhancement with high-quality guid- ance.IEEE Transactions on Neural Networks and Learning Systems, 2023. 6

  14. [14]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 1, 3

  15. [15]

    Pansharpening via detail injec- tion based convolutional neural networks.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(4):1188–1204, 2019

    Lin He, Yizhou Rao, Jun Li, Jocelyn Chanussot, Antonio Plaza, Jiawei Zhu, and Bo Li. Pansharpening via detail injec- tion based convolutional neural networks.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(4):1188–1204, 2019. 2, 7, 8, 9, 3, 4

  16. [16]

    Pan- mamba: Effective pan-sharpening with state space model

    Xuanhua He, Ke Cao, Jie Zhang, Keyu Yan, Yingying Wang, Rui Li, Chengjun Xie, Danfeng Hong, and Man Zhou. Pan- mamba: Effective pan-sharpening with state space model. Information Fusion, 115:102779, 2025. 2, 7, 8, 3, 4

  17. [17]

    Gaussian Error Linear Units (GELUs)

    Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus).arXiv preprint arXiv:1606.08415, 2016. 3

  18. [18]

    Denoising diffu- sion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 2, 3, 5, 6

  19. [19]

    LoRA: Low-rank adaptation of large language models

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. 8

  20. [20]

    Lagconv: Local-context adap- tive convolution kernels with global harmonic bias for pan- sharpening

    Zi-Rong Jin, Tian-Jing Zhang, Tai-Xiang Jiang, Gemine Vivone, and Liang-Jian Deng. Lagconv: Local-context adap- tive convolution kernels with global harmonic bias for pan- sharpening. InProceedings of the AAAI conference on ar- tificial intelligence, pages 1113–1121, 2022. 2, 7, 8, 9, 3, 4

  21. [21]

    Elucidating the design space of diffusion-based generative models.Advances in neural information processing systems, 35:26565–26577, 2022

    Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models.Advances in neural information processing systems, 35:26565–26577, 2022. 2, 3, 5

  22. [22]

    Transformers are rnns: Fast autoregressive transformers with linear attention

    Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and Franc ¸ois Fleuret. Transformers are rnns: Fast autoregressive transformers with linear attention. InInternational conference on machine learning, pages 5156–5165. PMLR, 2020. 4

  23. [23]

    Auto-encoding variational bayes, 2013

    Diederik P Kingma, Max Welling, et al. Auto-encoding variational bayes, 2013. 6

  24. [24]

    Tensor decompositions and applications.SIAM review, 51(3):455–500, 2009

    Tamara G Kolda and Brett W Bader. Tensor decompositions and applications.SIAM review, 51(3):455–500, 2009. 4

  25. [25]

    Extracting spectral contrast in landsat thematic mapper image data using selective principal component analysis.Photogramm

    P Kwarteng and A Chavez. Extracting spectral contrast in landsat thematic mapper image data using selective principal component analysis.Photogramm. Eng. Remote Sens, 55(1): 339–348, 1989. 1 9

  26. [26]

    Diffusion models for image restoration and enhancement: a comprehensive sur- vey.International Journal of Computer Vision, pages 1–31,

    Xin Li, Yulin Ren, Xin Jin, Cuiling Lan, Xingrui Wang, Wen- jun Zeng, Xinchao Wang, and Zhibo Chen. Diffusion models for image restoration and enhancement: a comprehensive sur- vey.International Journal of Computer Vision, pages 1–31,

  27. [27]

    Pmac- net: Parallel multiscale attention constraint network for pan- sharpening.IEEE Geoscience and Remote Sensing Letters, 19:1–5, 2022

    Yixun Liang, Ping Zhang, Yang Mei, and Tingqi Wang. Pmac- net: Parallel multiscale attention constraint network for pan- sharpening.IEEE Geoscience and Remote Sensing Letters, 19:1–5, 2022. 2

  28. [28]

    Flow Matching for Generative Modeling

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. arXiv preprint arXiv:2210.02747, 2022. 3, 5

  29. [29]

    Residual denoising diffusion models

    Jiawei Liu, Qiang Wang, Huijie Fan, Yinong Wang, Yan- dong Tang, and Liangqiong Qu. Residual denoising diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2773–2783,

  30. [30]

    Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

    Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022. 3

  31. [31]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017. 6, 3

  32. [32]

    Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps.Advances in Neural Information Processing Systems, 35:5775–5787, 2022

    Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps.Advances in Neural Information Processing Systems, 35:5775–5787, 2022. 5

  33. [33]

    Pansharpening by convolutional neural networks.Remote Sensing, 8(7):594, 2016

    Giuseppe Masi, Davide Cozzolino, Luisa Verdoliva, and Giuseppe Scarpa. Pansharpening by convolutional neural networks.Remote Sensing, 8(7):594, 2016. 2, 7, 8, 3, 4

  34. [34]

    Pan- diff: A novel pansharpening method based on denoising diffu- sion probabilistic model.IEEE Transactions on Geoscience and Remote Sensing, 61:1–17, 2023

    Qingyan Meng, Wenxu Shi, Sijia Li, and Linlin Zhang. Pan- diff: A novel pansharpening method based on denoising diffu- sion probabilistic model.IEEE Transactions on Geoscience and Remote Sensing, 61:1–17, 2023. 2, 3, 7, 8, 4

  35. [35]

    Pansharpening with a guided filter based on three-layer decomposition.Sensors, 16(7):1068, 2016

    Xiangchao Meng, Jie Li, Huanfeng Shen, Liangpei Zhang, and Hongyan Zhang. Pansharpening with a guided filter based on three-layer decomposition.Sensors, 16(7):1068, 2016. 1

  36. [36]

    Improved denoising diffusion probabilistic models

    Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InInternational conference on machine learning, pages 8162–8171. PMLR,

  37. [37]

    Introduction of sensor spectral response into image fusion methods

    Xavier Otazu, Mar´ıa Gonz´alez-Aud´ıcana, Octavi Fors, and Jorge N´u˜nez. Introduction of sensor spectral response into image fusion methods. application to wavelet-based methods. IEEE Transactions on Geoscience and Remote Sensing, 43 (10):2376–2385, 2005. 1

  38. [38]

    Source-adaptive discriminative kernels based network for remote sensing pansharpening

    Siran Peng, Liang-Jian Deng, Jin-Fan Hu, and Yu-Wei Zhuo. Source-adaptive discriminative kernels based network for remote sensing pansharpening. InIJCAI, pages 1283–1289,

  39. [39]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 2, 3

  40. [40]

    U- net: Convolutional networks for biomedical image segmen- tation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmen- tation. InMedical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer, 2015. 5, 1, 3

  41. [41]

    Unsupervised hyperspectral pansharp- ening via low-rank diffusion model.Information Fusion, 107: 102325, 2024

    Xiangyu Rui, Xiangyong Cao, Li Pang, Zeyu Zhu, Zongsheng Yue, and Deyu Meng. Unsupervised hyperspectral pansharp- ening via low-rank diffusion model.Information Fusion, 107: 102325, 2024. 3, 7, 8, 2, 4

  42. [42]

    Image super- resolution via iterative refinement.IEEE transactions on pattern analysis and machine intelligence, 45(4):4713–4726,

    Chitwan Saharia, Jonathan Ho, William Chan, Tim Sali- mans, David J Fleet, and Mohammad Norouzi. Image super- resolution via iterative refinement.IEEE transactions on pattern analysis and machine intelligence, 45(4):4713–4726,

  43. [43]

    Bespoke solvers for generative flow models.arXiv preprint arXiv:2310.19075, 2023

    Neta Shaul, Juan Perez, Ricky TQ Chen, Ali Thabet, Albert Pumarola, and Yaron Lipman. Bespoke solvers for generative flow models.arXiv preprint arXiv:2310.19075, 2023. 3

  44. [44]

    Efficient attention: Attention with linear com- plexities

    Zhuoran Shen, Mingyuan Zhang, Haiyu Zhao, Shuai Yi, and Hongsheng Li. Efficient attention: Attention with linear com- plexities. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 3531–3539, 2021. 4

  45. [45]

    Denoising Diffusion Implicit Models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502,

  46. [46]

    Score-Based Generative Modeling through Stochastic Differential Equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020. 3

  47. [47]

    Consistency models

    Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. 2023. 3

  48. [48]

    Revisiting spatial- frequency information integration from a hierarchical per- spective for panchromatic and multi-spectral image fusion

    Jiangtong Tan, Jie Huang, Naishan Zheng, Man Zhou, Keyu Yan, Danfeng Hong, and Feng Zhao. Revisiting spatial- frequency information integration from a hierarchical per- spective for panchromatic and multi-spectral image fusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25922–25931, 2024. 3

  49. [49]

    Some mathematical notes on three-mode factor analysis.Psychometrika, 31(3):279–311, 1966

    Ledyard R Tucker. Some mathematical notes on three-mode factor analysis.Psychometrika, 31(3):279–311, 1966. 4

  50. [50]

    Attention is all you need.Advances in neural information processing systems, 30, 2017

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 4

  51. [51]

    Robust band-dependent spatial-detail ap- proaches for panchromatic sharpening.IEEE transactions on Geoscience and Remote Sensing, 57(9):6421–6433, 2019

    Gemine Vivone. Robust band-dependent spatial-detail ap- proaches for panchromatic sharpening.IEEE transactions on Geoscience and Remote Sensing, 57(9):6421–6433, 2019. 7, 8, 2, 3, 4

  52. [52]

    A regression-based high-pass modulation pansharpening ap- proach.IEEE Transactions on geoscience and remote sensing, 56(2):984–996, 2017

    Gemine Vivone, Rocco Restaino, and Jocelyn Chanussot. A regression-based high-pass modulation pansharpening ap- proach.IEEE Transactions on geoscience and remote sensing, 56(2):984–996, 2017. 1

  53. [53]

    Full scale regression-based injection coefficients for panchromatic sharpening.IEEE Transactions on Image Processing, 27(7): 3418–3431, 2018

    Gemine Vivone, Rocco Restaino, and Jocelyn Chanussot. Full scale regression-based injection coefficients for panchromatic sharpening.IEEE Transactions on Image Processing, 27(7): 3418–3431, 2018. 7, 8, 2, 3, 4

  54. [54]

    Presses des MINES, 2002

    Lucien Wald.Data fusion: definitions and architectures: fusion of images of different spatial resolutions. Presses des MINES, 2002. 6 10

  55. [55]

    Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images.Photogrammetric engineering and remote sensing, 63(6):691–699, 1997

    Lucien Wald, Thierry Ranchin, and Marc Mangolini. Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images.Photogrammetric engineering and remote sensing, 63(6):691–699, 1997. 6

  56. [56]

    Exploiting diffusion prior for real-world image super-resolution.International Journal of Computer Vision, 132(12):5929–5949, 2024

    Jianyi Wang, Zongsheng Yue, Shangchen Zhou, Kelvin CK Chan, and Chen Change Loy. Exploiting diffusion prior for real-world image super-resolution.International Journal of Computer Vision, 132(12):5929–5949, 2024. 3

  57. [57]

    Neural network diffusion

    Kai Wang, Dongwen Tang, Boya Zeng, Yida Yin, Zhaopan Xu, Yukun Zhou, Zelin Zang, Trevor Darrell, Zhuang Liu, and Yang You. Neural network diffusion.arXiv preprint arXiv:2402.13144, 2024. 3

  58. [58]

    Multi-scale- and-depth convolutional neural network for remote sensed imagery pan-sharpening

    Yancong Wei, Qiangqiang Yuan, Xiangchao Meng, Huan- feng Shen, Liangpei Zhang, and Michael Ng. Multi-scale- and-depth convolutional neural network for remote sensed imagery pan-sharpening. In2017 IEEE International Geo- science and Remote Sensing Symposium (IGARSS), pages 3413–3416. IEEE, 2017. 7, 8, 2, 3, 4

  59. [59]

    A post- classification change detection method based on iterative slow feature analysis and bayesian soft fusion.Remote Sensing of Environment, 199:241–255, 2017

    Chen Wu, Bo Du, Xiaohui Cui, and Liangpei Zhang. A post- classification change detection method based on iterative slow feature analysis and bayesian soft fusion.Remote Sensing of Environment, 199:241–255, 2017. 1

  60. [60]

    Dynamic cross feature fusion for remote sensing pan- sharpening

    Xiao Wu, Ting-Zhu Huang, Liang-Jian Deng, and Tian-Jing Zhang. Dynamic cross feature fusion for remote sensing pan- sharpening. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 14687–14696, 2021. 2, 7, 8, 3, 4

  61. [61]

    Lrtcfpan: Low-rank tensor completion based framework for pansharpen- ing.IEEE Transactions on Image Processing, 32:1640–1655,

    Zhong-Cheng Wu, Ting-Zhu Huang, Liang-Jian Deng, Jie Huang, Jocelyn Chanussot, and Gemine Vivone. Lrtcfpan: Low-rank tensor completion based framework for pansharpen- ing.IEEE Transactions on Image Processing, 32:1640–1655,

  62. [62]

    A framelet sparse reconstruction method for pansharpening with guaranteed convergence.Inverse Problems and Imaging, 17(6):1277–1300, 2023

    Zhong-Cheng Wu, Ting-Zhu Huang, Liang-Jian Deng, and Gemine Vivone. A framelet sparse reconstruction method for pansharpening with guaranteed convergence.Inverse Problems and Imaging, 17(6):1277–1300, 2023. 1

  63. [63]

    Diffir: Efficient diffusion model for image restoration

    Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, and Luc Van Gool. Diffir: Efficient diffusion model for image restoration. InProceed- ings of the IEEE/CVF International Conference on Computer Vision, pages 13095–13105, 2023. 2, 3, 6

  64. [64]

    Hyperspectral pansharpening via diffusion models with iteratively zero-shot guidance

    Jin-Liang Xiao, Ting-Zhu Huang, Liang-Jian Deng, Guang Lin, Zihan Cao, Chao Li, and Qibin Zhao. Hyperspectral pansharpening via diffusion models with iteratively zero-shot guidance. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12669–12678, 2025. 3

  65. [65]

    Pannet: A deep network architecture for pan-sharpening

    Junfeng Yang, Xueyang Fu, Yuwen Hu, Yue Huang, Xinghao Ding, and John Paisley. Pannet: A deep network architecture for pan-sharpening. InProceedings of the IEEE international conference on computer vision, pages 5449–5457, 2017. 2

  66. [66]

    Deep learning for single image super-resolution: A brief review.IEEE Transactions on Multimedia, 21(12):3106–3121, 2019

    Wenming Yang, Xuechen Zhang, Yapeng Tian, Wei Wang, Jing-Hao Xue, and Qingmin Liao. Deep learning for single image super-resolution: A brief review.IEEE Transactions on Multimedia, 21(12):3106–3121, 2019. 5

  67. [67]

    A review of deep learning methods for semantic segmentation of remote sensing imagery.Expert Systems with Applications, 169: 114417, 2021

    Xiaohui Yuan, Jianfang Shi, and Lichuan Gu. A review of deep learning methods for semantic segmentation of remote sensing imagery.Expert Systems with Applications, 169: 114417, 2021. 1

  68. [68]

    Resshift: Efficient diffusion model for image super-resolution by resid- ual shifting.Advances in Neural Information Processing Systems, 36:13294–13307, 2023

    Zongsheng Yue, Jianyi Wang, and Chen Change Loy. Resshift: Efficient diffusion model for image super-resolution by resid- ual shifting.Advances in Neural Information Processing Systems, 36:13294–13307, 2023. 5

  69. [69]

    Dpm- solver-v3: Improved diffusion ode solver with empirical model statistics.Advances in Neural Information Processing Systems, 36:55502–55542, 2023

    Kaiwen Zheng, Cheng Lu, Jianfei Chen, and Jun Zhu. Dpm- solver-v3: Improved diffusion ode solver with empirical model statistics.Advances in Neural Information Processing Systems, 36:55502–55542, 2023. 3, 5

  70. [70]

    A wavelet transform method to merge landsat tm and spot panchromatic data.International journal of remote sensing, 19(4):743–757,

    Jie Zhou, Daniel L Civco, and John A Silander. A wavelet transform method to merge landsat tm and spot panchromatic data.International journal of remote sensing, 19(4):743–757,

  71. [71]

    Denoising diffusion bridge models.arXiv preprint arXiv:2309.16948, 2023

    Linqi Zhou, Aaron Lou, Samar Khanna, and Stefano Er- mon. Denoising diffusion bridge models.arXiv preprint arXiv:2309.16948, 2023. 3

  72. [72]

    Pan-sharpening with customized transformer and invert- ible neural network

    Man Zhou, Jie Huang, Yanchi Fang, Xueyang Fu, and Aiping Liu. Pan-sharpening with customized transformer and invert- ible neural network. InProceedings of the AAAI conference on artificial intelligence, pages 3553–3561, 2022. 2, 7, 8, 3, 4

  73. [73]

    Memory-augmented deep unfolding network for guided image super-resolution.International Journal of Computer Vision, 131(1):215–242, 2023

    Man Zhou, Keyu Yan, Jinshan Pan, Wenqi Ren, Qi Xie, and Xiangyong Cao. Memory-augmented deep unfolding network for guided image super-resolution.International Journal of Computer Vision, 131(1):215–242, 2023. 7, 8, 2, 3, 4 11 Fast Kernel-Space Diffusion for Remote Sensing Pansharpening Supplementary Material Abstract In this supplementary material, we fir...

  74. [74]

    Pansharpening Network The baseline network we build is demonstrated in Fig

    Methods Explanation 6.1. Pansharpening Network The baseline network we build is demonstrated in Fig. 7. The panchromatic (PAN) imageP∈R H×W×1 is first dupli- cated along the channel dimension to match the number of channels in the low-resolution multispectral (LRMS) image M∈R H×W×C . The duplicated PAN image is then sub- tracted by the LRMS image, and the...

  75. [75]

    Datasets We conducted experiments using datasets derived from WorldView-3 (WV3), QuickBird (QB), and GaoFen-2 (GF2) satellite imagery

    Details on Experiments 7.1. Datasets We conducted experiments using datasets derived from WorldView-3 (WV3), QuickBird (QB), and GaoFen-2 (GF2) satellite imagery. These datasets consist of image patches cropped from full remote sensing scenes and are partitioned into training and testing subsets. The WV3 dataset contains four images from two geographic lo...

  76. [76]

    Main Results Tab

    Additional Results 8.1. Main Results Tab. 8 and Tab. 9 present the quantitative performance bench- marks on the full-resolution GF2 and QB datasets. The results indicate that the proposed KSDiff method exhibits strong generalization capabilities across different data do- mains. Fig. 9 to Fig. 14 provide qualitative comparisons of visual outputs generated ...