pith. machine review for the scientific record. sign in

arxiv: 2605.09455 · v1 · submitted 2026-05-10 · 💻 cs.CV

Recognition: no theorem link

Adaptive 3D Convolution for Remote Sensing Image Fusion

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:35 UTC · model grok-4.3

classification 💻 cs.CV
keywords remote sensing image fusionadaptive 3D convolutionhyperspectral imagingcontent-aware kernelsspectral preservationgroup convolutiondeep learning fusion methods
0
0 comments X

The pith

Adaptive 3D convolution applies unique kernels to each voxel to fuse remote sensing images while preserving spectral detail better than 2D or fixed 3D methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that remote sensing image fusion suffers when treated as a 2D problem because spectral information gets encoded into feature channels and distorted. Treating the data as a true 3D volume helps, but standard 3D convolutions apply identical kernels everywhere and remain computationally heavy. The proposed solution generates a distinct 3D kernel for every voxel by first pulling spatial kernels from the high-resolution input and spectral kernels from the low-resolution input, then combining those kernels into content-aware filters and adding per-voxel biases. Group convolution keeps the approach efficient, and experiments on five datasets report state-of-the-art fusion quality.

Core claim

Ada3D produces and applies a unique set of 3D kernels to each input voxel by deriving spatial kernels from the high-resolution source and spectral kernels from the low-resolution source, combining them into content-aware 3D kernels, and supplementing the result with adaptive per-voxel biases, all while using group convolution to control cost. This per-voxel adaptivity captures fine-grained details and integrates spatial-spectral information more effectively than either channel-stacked 2D convolutions or uniform 3D convolutions.

What carries the argument

Adaptive 3D Convolution (Ada3D), a layer that creates a distinct 3D kernel for every voxel through a two-step spatial-spectral combination plus adaptive biases.

If this is right

  • Spectral distortions introduced by encoding spectral bands as 2D feature channels are reduced when spectral data is kept as an explicit dimension with adaptive 3D kernels.
  • Fixed-kernel 3D convolutions become sub-optimal because they cannot adjust to local content variations that Ada3D addresses with per-voxel kernels.
  • Group convolution allows full adaptivity without the full computational cost of naive per-voxel 3D filtering.
  • The method reaches state-of-the-art quantitative and qualitative results across five remote sensing fusion datasets.
  • Per-voxel adaptive biases further refine detail recovery at each location in the fused output.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The two-step kernel generation may extend naturally to other multi-modal fusion problems where data arrive in separate spatial and attribute dimensions.
  • Content-adaptive 3D processing could improve efficiency in downstream remote sensing tasks such as classification or change detection that rely on fused imagery.
  • Making 3D convolution both adaptive and grouped suggests a route to scaling spectral fusion to very large satellite archives.
  • Testing the same kernel-combination logic on non-remote-sensing volumetric data would clarify how far the adaptivity principle travels.

Load-bearing premise

That kernels built from separate spatial and spectral sources and applied per voxel will integrate information more accurately than fixed kernels without creating new artifacts.

What would settle it

A fusion dataset or test scene in which a standard 3D convolution or 2D method records lower reconstruction error or fewer visible spectral artifacts than Ada3D.

Figures

Figures reproduced from arXiv: 2605.09455 by Liang-Jian Deng, Shang-Qi Deng, Siran Peng, Xiangyu Zhu, Zhen Lei.

Figure 1
Figure 1. Figure 1: Comparison of Ada3D with several convolutional paradigms, includ [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Left: A visual representation of how 2D modeling introduces spectral distortions. (a) The vector space of spectral data, where the number of spectral bands L is set to 3 for simplicity. (b) The vector space of spectral information, which is encoded into feature map channels via a convolutional layer. This scenario illustrates the case where rank(A) = 2 < L, resulting in a loss of spectral information. Righ… view at source ↗
Figure 3
Figure 3. Figure 3: Graphical illustration of standard 2D convolution, adaptive 2D [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Graphical illustration of our network architecture, which consists of two main sections: a spatial branch and a spectral branch. The former focuses on [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The Ada3D block. Taking Fa and Fb as inputs, the kernel and bias generators produce adaptive kernels K and biases D. The acquired K and D are then applied to Fb , facilitating the integration of spatial and spectral information. The 1D and 2D convolutional layers use kernel sizes of 3 and 3 × 3, both with a stride and padding of 1. Normalization is applied over each k × k × k kernel field, and the number o… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative evaluation results for hyper-spectral pansharpening on the WDC, Botswana, and Pavia datasets. Row 1: Pseudo-color images of spectral [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of spectral vectors at three randomly selected spatial locations from a testing sample in the WDC dataset. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative evaluation results on a reduced-resolution example from the WV3 dataset, which belongs to the pansharpening task. Rows 1 and 3: [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Graphical illustration of Ada3D’s applications. [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Qualitative evaluation results of SOTA HISR methods on two testing [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗
read the original abstract

Remote sensing image fusion aims to create a high-resolution multi/hyper-spectral image from a high-resolution image with limited spectral information and a low-resolution image with abundant spectral data. Recently, deep learning (DL) techniques have shown significant effectiveness in this area. Most DL-based methods approach image fusion as a 2D problem by encoding spectral information into feature map channels. However, our research suggests that this strategy introduces notable spectral distortions. In contrast, some methods consider spectral data as an additional dimension, utilizing standard 3D convolutions to preserve spectral information. Nevertheless, in a standard 3D convolutional layer, the same set of kernels is applied across all input regions, which we have found to be sub-optimal for image fusion. Furthermore, standard 3D convolutions necessitate substantial computational resources. To address these challenges, we propose a novel convolutional paradigm called Adaptive 3D Convolution (Ada3D) for remote sensing image fusion. Ada3D applies a unique set of 3D kernels to each input voxel, enabling the capture of fine-grained details. These adaptive kernels are generated through a two-step process: (i) spatial and spectral kernels are derived from their respective image sources; (ii) these two types of kernels are then combined to form content-aware 3D kernels that effectively integrate spatial and spectral information. Additionally, adaptive biases are introduced to enhance the convolutional outcome at the voxel level. Furthermore, we incorporate the group convolution technique to reduce computational complexity. As a result, Ada3D offers full adaptivity in an efficient manner. Evaluation results across five datasets demonstrate that our method achieves SOTA performance, underscoring the superiority of Ada3D. The code is available at https://github.com/PSRben/Ada3D.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes Adaptive 3D Convolution (Ada3D) for remote sensing image fusion to address spectral distortions in 2D DL methods and the sub-optimality of fixed kernels plus high compute in standard 3D convolutions. Ada3D generates per-voxel content-aware 3D kernels via a two-step process (spatial kernels from high-resolution source, spectral kernels from low-resolution source, then combined) plus per-voxel adaptive biases, incorporates group convolution for efficiency, and claims SOTA results on five datasets.

Significance. If the superiority claims hold with proper validation, Ada3D could provide a practical advance in handling spatial-spectral fusion by enabling local adaptivity without full per-voxel kernel computation, potentially reducing artifacts while maintaining efficiency. The public code release supports reproducibility.

major comments (2)
  1. [Abstract] Abstract: The central claim that 'Evaluation results across five datasets demonstrate that our method achieves SOTA performance' is unsupported by any reported metrics, baselines, ablation tables, or error analysis, which is load-bearing for the superiority assertion and prevents verification of whether the two-step kernel combination (rather than capacity or training differences) drives the gains.
  2. [Abstract] Abstract (method description): No quantitative comparison or ablation is described that isolates the effect of the two-step spatial-spectral kernel generation plus per-voxel biases against simpler alternatives such as standard 3D convolution, group-conv-only, or per-voxel adaptation without the combination step; this is required to substantiate that the specific construction is what improves integration over fixed kernels.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point-by-point below. The full manuscript contains the supporting experiments, but we agree the abstract can be strengthened for clarity and will revise accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that 'Evaluation results across five datasets demonstrate that our method achieves SOTA performance' is unsupported by any reported metrics, baselines, ablation tables, or error analysis, which is load-bearing for the superiority assertion and prevents verification of whether the two-step kernel combination (rather than capacity or training differences) drives the gains.

    Authors: The abstract is a concise summary; the full paper reports the requested details in Section 4 (Experiments) and Section 5 (Ablations), with Tables 1–5 providing quantitative metrics (PSNR, SSIM, SAM, ERGAS), baseline comparisons, and error analysis across the five datasets. These tables directly support the SOTA claim and allow verification that gains arise from the Ada3D design rather than capacity alone. We will revise the abstract to include one or two key quantitative highlights (e.g., average PSNR improvement) to make the claim self-contained. revision: yes

  2. Referee: [Abstract] Abstract (method description): No quantitative comparison or ablation is described that isolates the effect of the two-step spatial-spectral kernel generation plus per-voxel biases against simpler alternatives such as standard 3D convolution, group-conv-only, or per-voxel adaptation without the combination step; this is required to substantiate that the specific construction is what improves integration over fixed kernels.

    Authors: The manuscript describes the two-step process in Section 3 and includes ablation studies in Section 5.2 that compare Ada3D against standard 3D convolution and group-convolution variants. We acknowledge that a more targeted ablation isolating the exact two-step combination plus adaptive biases (versus per-voxel adaptation without combination) is not explicitly presented. We will add this specific ablation experiment in the revision to directly quantify the contribution of the proposed kernel-generation step. revision: yes

Circularity Check

0 steps flagged

No circularity: Ada3D is a forward architectural proposal with empirical SOTA claims

full rationale

The paper introduces Ada3D as a new convolutional layer that generates per-voxel 3D kernels via a two-step spatial-spectral process plus adaptive biases, then reports empirical gains on five datasets. No equations, predictions, or first-principles derivations are present that reduce the claimed performance or kernel adaptivity to quantities defined by the method itself. The method is defined independently of its evaluation outcomes, and no self-citation chains or fitted-input renamings appear in the load-bearing steps. This is a standard empirical architecture paper whose central claim rests on experimental results rather than any self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on standard deep-learning assumptions that convolutional networks can learn effective fusion mappings and that the proposed kernel-generation process is a valid way to achieve content awareness. No explicit free parameters or invented physical entities are described in the abstract.

axioms (1)
  • domain assumption Convolutional neural networks can learn mappings that preserve spectral information when kernels are made spatially and spectrally adaptive.
    Implicit foundation for claiming superiority of Ada3D over fixed-kernel 3D convolutions.
invented entities (1)
  • Adaptive 3D Convolution (Ada3D) no independent evidence
    purpose: To apply a unique set of 3D kernels to each input voxel for remote sensing image fusion.
    New convolutional operation introduced by the paper.

pith-pipeline@v0.9.0 · 5628 in / 1336 out tokens · 29032 ms · 2026-05-12T04:35:22.223070+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages

  1. [1]

    Pixel-adaptive convolutional neural networks,

    H. Su, V . Jampani, D. Sun, O. Gallo, E. Learned-Miller, and J. Kautz, “Pixel-adaptive convolutional neural networks,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 11 158–11 167

  2. [2]

    D2conv3d: Dynamic dilated convolutions for object segmentation in videos,

    C. Schmidt, A. Athar, S. Mahadevan, and B. Leibe, “D2conv3d: Dynamic dilated convolutions for object segmentation in videos,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), January 2022, pp. 1200–1209

  3. [3]

    Deep spatio- temporal adaptive 3d convolutional neural networks for traffic flow prediction,

    H. Li, X. Li, L. Su, D. Jin, J. Huang, and D. Huang, “Deep spatio- temporal adaptive 3d convolutional neural networks for traffic flow prediction,”ACM Transactions on Intelligent Systems and Technology, vol. 13, no. 2, pp. 1–21, 2022

  4. [4]

    A new benchmark based on recent advances in multispectral pansharpening: Revisiting pansharpening with classical and emerging pansharpening methods,

    G. Vivone, M. Dalla Mura, A. Garzelli, R. Restaino, G. Scarpa, M. O. Ulfarsson, L. Alparone, and J. Chanussot, “A new benchmark based on recent advances in multispectral pansharpening: Revisiting pansharpening with classical and emerging pansharpening methods,” IEEE Geoscience and Remote Sensing Magazine, vol. 9, no. 1, pp. 53– 81, 2021

  5. [5]

    Hyperspectral pansharpening: A review,

    L. Loncan, L. B. de Almeida, J. M. Bioucas-Dias, X. Briottet, J. Chanus- sot, N. Dobigeon, S. Fabre, W. Liao, G. A. Licciardi, M. Sim ˜oes, J.- Y . Tourneret, M. A. Veganzones, G. Vivone, Q. Wei, and N. Yokoya, “Hyperspectral pansharpening: A review,”IEEE Geoscience and Remote Sensing Magazine, vol. 3, no. 3, pp. 27–46, 2015

  6. [6]

    Mtf- tailored multiscale fusion of high-resolution ms and pan imagery,

    B. Aiazzi, L. Alparone, S. Baronti, A. Garzelli, and M. Selva, “Mtf- tailored multiscale fusion of high-resolution ms and pan imagery,” Photogrammetric Engineering & Remote Sensing, vol. 72, no. 5, pp. 591–596, 2006

  7. [7]

    A new adaptive component-substitution- based satellite image fusion by using partial replacement,

    J. Choi, K. Yu, and Y . Kim, “A new adaptive component-substitution- based satellite image fusion by using partial replacement,”IEEE Trans- actions on Geoscience and Remote Sensing, 2010

  8. [8]

    Robust band-dependent spatial-detail approaches for panchromatic sharpening,

    G. Vivone, “Robust band-dependent spatial-detail approaches for panchromatic sharpening,”IEEE Transactions on Geoscience and Re- mote Sensing, vol. 57, no. 9, pp. 6421–6433, 2019

  9. [9]

    Smoothing filter-based intensity modulation: A spectral preserve image fusion technique for improving spatial details,

    J. Liu, “Smoothing filter-based intensity modulation: A spectral preserve image fusion technique for improving spatial details,”International Journal of Remote Sensing, vol. 21, no. 18, pp. 3461–3472, 2000

  10. [10]

    Contrast and error-based fusion schemes for multispectral image pansharpening,

    G. Vivone, R. Restaino, M. Dalla Mura, G. Licciardi, and J. Chanus- sot, “Contrast and error-based fusion schemes for multispectral image pansharpening,”IEEE Geoscience and Remote Sensing Letters, vol. 11, no. 5, pp. 930–934, 2014

  11. [11]

    Full scale regression-based injection coefficients for panchromatic sharpening,

    G. Vivone, R. Restaino, and J. Chanussot, “Full scale regression-based injection coefficients for panchromatic sharpening,”IEEE Transactions on Image Processing, vol. 27, no. 7, pp. 3418–3431, 2018

  12. [12]

    A new pansharpening algorithm based on total variation,

    F. Palsson, J. R. Sveinsson, and M. O. Ulfarsson, “A new pansharpening algorithm based on total variation,”IEEE Geoscience and Remote Sensing Letters, vol. 11, no. 1, pp. 318–322, 2013

  13. [13]

    Pan-sharpening of multi-spectral images using a new variational model,

    G. Zhang, F. Fang, A. Zhou, and F. Li, “Pan-sharpening of multi-spectral images using a new variational model,”International Journal of Remote Sensing, vol. 36, no. 5, pp. 1484–1508, 2015

  14. [14]

    High-quality bayesian pan- sharpening,

    T. Wang, F. Fang, F. Li, and G. Zhang, “High-quality bayesian pan- sharpening,”IEEE Transactions on Image Processing, vol. 28, no. 1, pp. 227–239, 2019

  15. [15]

    An efficient pansharpening approach based on texture correction and detail refinement,

    H. Lu, Y . Yang, S. Huang, and W. Tu, “An efficient pansharpening approach based on texture correction and detail refinement,”IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022

  16. [16]

    Pansharpening based on variational fractional-order geometry model and optimized injection gains,

    Y . Yang, H. Lu, S. Huang, W. Wan, and L. Li, “Pansharpening based on variational fractional-order geometry model and optimized injection gains,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 2128–2141, 2022

  17. [17]

    Pansharpening by convolutional neural networks,

    M. Giuseppe, C. Davide, V . Luisa, and S. Giuseppe, “Pansharpening by convolutional neural networks,”Remote Sensing, vol. 8, no. 7, p. 594, 2016

  18. [18]

    Pannet: A deep network architecture for pan-sharpening,

    J. Yang, X. Fu, Y . Hu, Y . Huang, X. Ding, and J. Paisley, “Pannet: A deep network architecture for pan-sharpening,” inProceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1753–1761

  19. [19]

    Hyperpnn: Hyperspectral pansharpening via spectrally predictive convolutional neural networks,

    L. He, J. Zhu, J. Li, A. Plaza, J. Chanussot, and B. Li, “Hyperpnn: Hyperspectral pansharpening via spectrally predictive convolutional neural networks,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 12, no. 8, pp. 3092–3100, 2019

  20. [20]

    Deep multiscale detail networks for multiband spectral image sharpening,

    X. Fu, W. Wang, Y . Huang, X. Ding, and J. Paisley, “Deep multiscale detail networks for multiband spectral image sharpening,”IEEE Trans- actions on Neural Networks and Learning Systems, vol. 32, no. 5, pp. 2090–2104, 2021

  21. [21]

    Detail injection- based deep convolutional neural networks for pansharpening,

    L.-J. Deng, G. Vivone, C. Jin, and J. Chanussot, “Detail injection- based deep convolutional neural networks for pansharpening,”IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 8, pp. 6995–7010, 2021

  22. [22]

    D2tnet: A convlstm network with dual-direction transfer for pan-sharpening,

    M. Gong, J. Ma, H. Xu, X. Tian, and X.-P. Zhang, “D2tnet: A convlstm network with dual-direction transfer for pan-sharpening,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–14, 2022

  23. [23]

    Zero-shot hyperspectral sharpening,

    R. Dian, A. Guo, and S. Li, “Zero-shot hyperspectral sharpening,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 10, pp. 12 650–12 666, 2023

  24. [24]

    Pansharpening via frequency-aware fusion network with explicit similarity constraints,

    Y . Xing, Y . Zhang, H. He, X. Zhang, and Y . Zhang, “Pansharpening via frequency-aware fusion network with explicit similarity constraints,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1– 14, 2023

  25. [25]

    U2net: A general framework with spatial-spectral-integrated double u-net for image fusion,

    S. Peng, C. Guo, X. Wu, and L.-J. Deng, “U2net: A general framework with spatial-spectral-integrated double u-net for image fusion,” inPro- ceedings of the 31st ACM International Conference on Multimedia (ACM MM). New York, NY , USA: Association for Computing Machinery, 2023, p. 3219–3227

  26. [26]

    Multispectral image pan-sharpening guided by component substitution model,

    H. Gao, S. Li, J. Li, and R. Dian, “Multispectral image pan-sharpening guided by component substitution model,”IEEE Transactions on Geo- science and Remote Sensing, vol. 61, pp. 1–13, 2023

  27. [27]

    Paps: Progressive attention- based pan-sharpening,

    Y . Jia, Q. Hu, R. Dian, J. Ma, and X. Guo, “Paps: Progressive attention- based pan-sharpening,”IEEE/CAA Journal of Automatica Sinica, vol. 11, no. 2, pp. 391–404, 2024

  28. [28]

    Learning a deep convolu- tional network for image super-resolution,

    C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolu- tional network for image super-resolution,” inEuropean Conference on Computer Vision (ECCV), 2014, pp. 184–199

  29. [29]

    Rain o’er me: Synthesizing real rain to derain with data distillation,

    H. Lin, Y . Li, X. Fu, X. Ding, Y . Huang, and J. Paisley, “Rain o’er me: Synthesizing real rain to derain with data distillation,”IEEE Transactions on Image Processing, vol. 29, pp. 7668–7680, 2020

  30. [30]

    Cooperated spectral low-rankness prior and deep spatial prior for hsi unsupervised denoising,

    Q. Zhang, Q. Yuan, M. Song, H. Yu, and L. Zhang, “Cooperated spectral low-rankness prior and deep spatial prior for hsi unsupervised denoising,”IEEE Transactions on Image Processing, vol. 31, pp. 6356– 6368, 2022

  31. [31]

    Ttst: A top- k token selective transformer for remote sensing image super-resolution,

    Y . Xiao, Q. Yuan, K. Jiang, J. He, C.-W. Lin, and L. Zhang, “Ttst: A top- k token selective transformer for remote sensing image super-resolution,” IEEE Transactions on Image Processing, vol. 33, pp. 738–752, 2024

  32. [32]

    Dynamic filter networks,

    X. Jia, B. De Brabandere, T. Tuytelaars, and L. V . Gool, “Dynamic filter networks,” inAdvances in neural information processing systems (NeurIPS), vol. 29, 2016, pp. 667–675

  33. [33]

    Adaptive convolutional kernels,

    J. Zamora Esquivel, A. Cruz Vargas, P. Lopez Meyer, and O. Tickoo, “Adaptive convolutional kernels,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2019, pp. 1998–2005

  34. [34]

    Decoupled dynamic filter networks,

    J. Zhou, V . Jampani, Z. Pi, Q. Liu, and M.-H. Yang, “Decoupled dynamic filter networks,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 6643– 6652

  35. [35]

    Lagconv: Local-context adaptive convolution kernels with global harmonic bias for pansharpening,

    Z.-R. Jin, T.-J. Zhang, T.-X. Jiang, G. Vivone, and L.-J. Deng, “Lagconv: Local-context adaptive convolution kernels with global harmonic bias for pansharpening,” inProceedings of the AAAI conference on artificial intelligence (AAAI), vol. 36, 2022, pp. 1113–1121

  36. [36]

    Source-adaptive dis- criminative kernels based network for remote sensing pansharpening,

    S. Peng, L.-J. Deng, J.-F. Hu, and Y . Zhuo, “Source-adaptive dis- criminative kernels based network for remote sensing pansharpening,” inProceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI), 2022, pp. 1283–1289

  37. [37]

    Content-adaptive non-local convolution for remote sensing pansharpening,

    Y . Duan, X. Wu, H. Deng, and L.-J. Deng, “Content-adaptive non-local convolution for remote sensing pansharpening,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 27 738–27 747

  38. [38]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778

  39. [39]

    Multiscale feature fusion for hyperspectral image classification using hybrid 3d-2d depthwise separable convolution networks,

    H. Firat, H. C ¸ i ˘g, M. T. G ¨ull¨uo˘glu, M. E. Asker, and D. Hanbay, “Multiscale feature fusion for hyperspectral image classification using hybrid 3d-2d depthwise separable convolution networks,”Traitement du Signal, vol. 40, no. 5, p. 1921, 2023

  40. [40]

    Hybrid 3d/2d complete inception module and convolutional neural network for hyper- spectral remote sensing image classification,

    H. Fırat, M. E. Asker, M. ˙I. Bayındır, and D. Hanbay, “Hybrid 3d/2d complete inception module and convolutional neural network for hyper- spectral remote sensing image classification,”Neural Processing Letters, vol. 55, no. 2, pp. 1087–1130, 2023

  41. [41]

    3d residual spatial–spectral convolution network for hyperspectral remote sensing image classification,

    H. Firat, M. E. Asker, M. I. Bayindir, and D. Hanbay, “3d residual spatial–spectral convolution network for hyperspectral remote sensing image classification,”Neural Computing and Applications, vol. 35, no. 6, pp. 4479–4497, 2023. 14

  42. [42]

    Dbanet: Dual-branch attention network for hyperspectral remote sensing image classification,

    Z. Li, G. Chen, G. Li, L. Zhou, X. Pan, W. Zhao, and W. Zhang, “Dbanet: Dual-branch attention network for hyperspectral remote sensing image classification,”Computers and Electrical Engineering, vol. 118, p. 109269, 2024

  43. [43]

    Hybrid fusionnet: A hybrid feature fusion framework for multisource high-resolution remote sensing image classification,

    Y . Zheng, S. Liu, H. Chen, and L. Bruzzone, “Hybrid fusionnet: A hybrid feature fusion framework for multisource high-resolution remote sensing image classification,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–14, 2024

  44. [44]

    Deep pansharpening via 3d spectral super- resolution network and discrepancy-based gradient transfer,

    H. Su, H. Jin, and C. Sun, “Deep pansharpening via 3d spectral super- resolution network and discrepancy-based gradient transfer,”Remote Sensing, vol. 14, no. 17, 2022

  45. [45]

    Msac-net: 3d multi-scale attention convolutional network for multi-spectral imagery pansharpening,

    E. Zhang, Y . Fu, J. Wang, L. Liu, K. Yu, and J. Peng, “Msac-net: 3d multi-scale attention convolutional network for multi-spectral imagery pansharpening,”Remote Sensing, vol. 14, no. 12, p. 2761, 2022

  46. [46]

    Pan-mamba: Effective pan-sharpening with state space model,

    X. He, K. Cao, K. Yan, R. Li, C. Xie, J. Zhang, and M. Zhou, “Pan-mamba: Effective pan-sharpening with state space model,”arXiv preprint arXiv:2402.12192, 2024

  47. [47]

    A deep-shallow fusion network with multidetail extractor and spectral attention for hyperspectral pansharpening,

    Y .-W. Zhuo, T.-J. Zhang, J.-F. Hu, H.-X. Dou, T.-Z. Huang, and L.-J. Deng, “A deep-shallow fusion network with multidetail extractor and spectral attention for hyperspectral pansharpening,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 7539–7555, 2022

  48. [48]

    A critical comparison among pansharpening algorithms,

    G. Vivone, L. Alparone, J. Chanussot, M. Dalla Mura, A. Garzelli, G. A. Licciardi, R. Restaino, and L. Wald, “A critical comparison among pansharpening algorithms,”IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 5, pp. 2565–2586, 2015

  49. [49]

    Dynamic filtering with large sampling field for convnets,

    J. Wu, D. Li, Y . Yang, C. Bajaj, and X. Ji, “Dynamic filtering with large sampling field for convnets,” inProceedings of the European Conference on Computer Vision (ECCV), September 2018, pp. 185–200

  50. [50]

    Coupled non-negative matrix factorization (cnmf) for hyperspectral and multispectral data fusion: Application to pasture classification,

    N. Yokoya, T. Yairi, and A. Iwasaki, “Coupled non-negative matrix factorization (cnmf) for hyperspectral and multispectral data fusion: Application to pasture classification,” in2011 IEEE International Geo- science and Remote Sensing Symposium (IGARSS), 2011, pp. 1779– 1782

  51. [51]

    A convex formulation for hyperspectral image superresolution via subspace-based regularization,

    M. Sim ˜oes, J. Bioucas-Dias, L. B. Almeida, and J. Chanussot, “A convex formulation for hyperspectral image superresolution via subspace-based regularization,”IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 6, pp. 3373–3388, 2015

  52. [52]

    Spectral- fidelity convolutional neural networks for hyperspectral pansharpening,

    L. He, J. Zhu, J. Li, D. Meng, J. Chanussot, and A. Plaza, “Spectral- fidelity convolutional neural networks for hyperspectral pansharpening,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 13, pp. 5898–5914, 2020

  53. [53]

    Feature pyramid fusion network for hyperspectral pansharpening,

    W. Dong, Y . Yang, J. Qu, Y . Li, Y . Yang, and X. Jia, “Feature pyramid fusion network for hyperspectral pansharpening,”IEEE Transactions on Neural Networks and Learning Systems, pp. 1–13, 2023

  54. [54]

    A general adaptive dual-level weighting mechanism for remote sensing pansharpening,

    J. Huang, H. Chen, J. Ren, S. Peng, and L. Deng, “A general adaptive dual-level weighting mechanism for remote sensing pansharpening,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2025, pp. 7447–7456

  55. [55]

    Dfcfn: Dual-stage feature correction fusion network for hyperspectral pansharpening,

    X. Wang, Y . Yang, S. Huang, W. Wan, Z. Liu, L. Zhang, and A. Zhao, “Dfcfn: Dual-stage feature correction fusion network for hyperspectral pansharpening,”IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1–14, 2025

  56. [56]

    Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images,

    L. Wald, T. Ranchin, and M. Mangolini, “Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images,” Photogrammetric Engineering and Remote Sensing, vol. 63, pp. 691– 699, 1997

  57. [57]

    Multi- scale-and-depth convolutional neural network for remote sensed imagery pan-sharpening,

    Y . Wei, Q. Yuan, X. Meng, H. Shen, L. Zhang, and M. Ng, “Multi- scale-and-depth convolutional neural network for remote sensed imagery pan-sharpening,” in2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2017, pp. 3413–3416

  58. [58]

    Pan-sharpening using an efficient bidirectional pyramid network,

    Y . Zhang, C. Liu, M. Sun, and Y . Ou, “Pan-sharpening using an efficient bidirectional pyramid network,”IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 8, pp. 5549–5563, 2019

  59. [59]

    Ssconv: Explicit spectral- to-spatial convolution for pansharpening,

    Y . Wang, L.-J. Deng, T.-J. Zhang, and X. Wu, “Ssconv: Explicit spectral- to-spatial convolution for pansharpening,” inProceedings of the 29th ACM International Conference on Multimedia (ACM MM). New York, NY , USA: Association for Computing Machinery, 2021, p. 4472–4480

  60. [60]

    Pmacnet: Parallel multiscale attention constraint network for pan-sharpening,

    Y . Liang, P. Zhang, Y . Mei, and T. Wang, “Pmacnet: Parallel multiscale attention constraint network for pan-sharpening,”IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022

  61. [61]

    Image quality assessment: from error visibility to structural similarity,

    Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,”IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004

  62. [62]

    Discrimination among semi-arid landscape endmembers using the spectral angle mapper (sam) algorithm,

    R. H. Yuhas, A. F. Goetz, and J. W. Boardman, “Discrimination among semi-arid landscape endmembers using the spectral angle mapper (sam) algorithm,” inJPL, Summaries of the Third Annual JPL Airborne Geoscience Workshop. Volume 1: AVIRIS Workshop, 1992

  63. [63]

    Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,

    K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), December 2015

  64. [64]

    Machine learning in pansharpening: A benchmark, from shallow to deep networks,

    L.-J. Deng, G. Vivone, M. E. Paoletti, G. Scarpa, J. He, Y . Zhang, J. Chanussot, and A. Plaza, “Machine learning in pansharpening: A benchmark, from shallow to deep networks,”IEEE Geoscience and Remote Sensing Magazine, vol. 10, no. 3, pp. 279–315, 2022

  65. [65]

    Hypercomplex quality assessment of multi/hyperspectral images,

    A. Garzelli and F. Nencini, “Hypercomplex quality assessment of multi/hyperspectral images,”IEEE Geoscience and Remote Sensing Letters, vol. 6, no. 4, pp. 662–665, 2009

  66. [66]

    Remote sensing image fusion based on two-stream fusion network,

    X. Liu, Q. Liu, and Y . Wang, “Remote sensing image fusion based on two-stream fusion network,”Information Fusion, vol. 55, pp. 1–15, 2020

  67. [67]

    Ssr-net: Spatial–spectral re- construction network for hyperspectral and multispectral image fusion,

    X. Zhang, W. Huang, Q. Wang, and X. Li, “Ssr-net: Spatial–spectral re- construction network for hyperspectral and multispectral image fusion,” IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 7, pp. 5953–5965, 2021

  68. [68]

    Model-guided deep hyperspectral image super-resolution,

    W. Dong, C. Zhou, F. Wu, J. Wu, G. Shi, and X. Li, “Model-guided deep hyperspectral image super-resolution,”IEEE Transactions on Image Processing, vol. 30, pp. 5754–5768, 2021

  69. [69]

    Fusformer: A transformer-based fusion network for hyperspectral im- age super-resolution,

    J.-F. Hu, T.-Z. Huang, L.-J. Deng, H.-X. Dou, D. Hong, and G. Vivone, “Fusformer: A transformer-based fusion network for hyperspectral im- age super-resolution,”IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022

  70. [70]

    Learning a 3d-cnn and transformer prior for hyperspectral image super-resolution,

    Q. Ma, J. Jiang, X. Liu, and J. Ma, “Learning a 3d-cnn and transformer prior for hyperspectral image super-resolution,”Information Fusion, vol. 100, p. 101907, 2023

  71. [71]

    Hyperspectral image super- resolution meets deep learning: A survey and perspective,

    X. Wang, Q. Hu, Y . Cheng, and J. Ma, “Hyperspectral image super- resolution meets deep learning: A survey and perspective,”IEEE/CAA Journal of Automatica Sinica, vol. 10, no. 8, pp. 1668–1691, 2023

  72. [72]

    Generalized assorted pixel camera: Postcapture control of resolution, dynamic range, and spectrum,

    F. Yasuma, T. Mitsunaga, D. Iso, and S. K. Nayar, “Generalized assorted pixel camera: Postcapture control of resolution, dynamic range, and spectrum,”IEEE Transactions on Image Processing, vol. 19, no. 9, pp. 2241–2253, 2010. 15 Supplementary Material I. SUPPLEMENTARYEXPLANATION FORREMARK1 First, we extend Definition 2 into matrix space as follows. Lemma ...