arxiv: 2605.09455 · v1 · submitted 2026-05-10 · 💻 cs.CV

Recognition: no theorem link

Adaptive 3D Convolution for Remote Sensing Image Fusion

Siran Peng , Xiangyu Zhu , Shang-Qi Deng , Liang-Jian Deng , Zhen Lei

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:35 UTC · model grok-4.3

classification 💻 cs.CV

keywords remote sensing image fusionadaptive 3D convolutionhyperspectral imagingcontent-aware kernelsspectral preservationgroup convolutiondeep learning fusion methods

0 comments

The pith

Adaptive 3D convolution applies unique kernels to each voxel to fuse remote sensing images while preserving spectral detail better than 2D or fixed 3D methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that remote sensing image fusion suffers when treated as a 2D problem because spectral information gets encoded into feature channels and distorted. Treating the data as a true 3D volume helps, but standard 3D convolutions apply identical kernels everywhere and remain computationally heavy. The proposed solution generates a distinct 3D kernel for every voxel by first pulling spatial kernels from the high-resolution input and spectral kernels from the low-resolution input, then combining those kernels into content-aware filters and adding per-voxel biases. Group convolution keeps the approach efficient, and experiments on five datasets report state-of-the-art fusion quality.

Core claim

Ada3D produces and applies a unique set of 3D kernels to each input voxel by deriving spatial kernels from the high-resolution source and spectral kernels from the low-resolution source, combining them into content-aware 3D kernels, and supplementing the result with adaptive per-voxel biases, all while using group convolution to control cost. This per-voxel adaptivity captures fine-grained details and integrates spatial-spectral information more effectively than either channel-stacked 2D convolutions or uniform 3D convolutions.

What carries the argument

Adaptive 3D Convolution (Ada3D), a layer that creates a distinct 3D kernel for every voxel through a two-step spatial-spectral combination plus adaptive biases.

If this is right

Spectral distortions introduced by encoding spectral bands as 2D feature channels are reduced when spectral data is kept as an explicit dimension with adaptive 3D kernels.
Fixed-kernel 3D convolutions become sub-optimal because they cannot adjust to local content variations that Ada3D addresses with per-voxel kernels.
Group convolution allows full adaptivity without the full computational cost of naive per-voxel 3D filtering.
The method reaches state-of-the-art quantitative and qualitative results across five remote sensing fusion datasets.
Per-voxel adaptive biases further refine detail recovery at each location in the fused output.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The two-step kernel generation may extend naturally to other multi-modal fusion problems where data arrive in separate spatial and attribute dimensions.
Content-adaptive 3D processing could improve efficiency in downstream remote sensing tasks such as classification or change detection that rely on fused imagery.
Making 3D convolution both adaptive and grouped suggests a route to scaling spectral fusion to very large satellite archives.
Testing the same kernel-combination logic on non-remote-sensing volumetric data would clarify how far the adaptivity principle travels.

Load-bearing premise

That kernels built from separate spatial and spectral sources and applied per voxel will integrate information more accurately than fixed kernels without creating new artifacts.

What would settle it

A fusion dataset or test scene in which a standard 3D convolution or 2D method records lower reconstruction error or fewer visible spectral artifacts than Ada3D.

Figures

Figures reproduced from arXiv: 2605.09455 by Liang-Jian Deng, Shang-Qi Deng, Siran Peng, Xiangyu Zhu, Zhen Lei.

**Figure 2.** Figure 2: Left: A visual representation of how 2D modeling introduces spectral distortions. (a) The vector space of spectral data, where the number of spectral bands L is set to 3 for simplicity. (b) The vector space of spectral information, which is encoded into feature map channels via a convolutional layer. This scenario illustrates the case where rank(A) = 2 < L, resulting in a loss of spectral information. Righ… view at source ↗

**Figure 3.** Figure 3: Graphical illustration of standard 2D convolution, adaptive 2D [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Graphical illustration of our network architecture, which consists of two main sections: a spatial branch and a spectral branch. The former focuses on [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: The Ada3D block. Taking Fa and Fb as inputs, the kernel and bias generators produce adaptive kernels K and biases D. The acquired K and D are then applied to Fb , facilitating the integration of spatial and spectral information. The 1D and 2D convolutional layers use kernel sizes of 3 and 3 × 3, both with a stride and padding of 1. Normalization is applied over each k × k × k kernel field, and the number o… view at source ↗

**Figure 6.** Figure 6: Qualitative evaluation results for hyper-spectral pansharpening on the WDC, Botswana, and Pavia datasets. Row 1: Pseudo-color images of spectral [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of spectral vectors at three randomly selected spatial locations from a testing sample in the WDC dataset. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Qualitative evaluation results on a reduced-resolution example from the WV3 dataset, which belongs to the pansharpening task. Rows 1 and 3: [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: Graphical illustration of Ada3D’s applications. [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 11.** Figure 11: Qualitative evaluation results of SOTA HISR methods on two testing [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗

read the original abstract

Remote sensing image fusion aims to create a high-resolution multi/hyper-spectral image from a high-resolution image with limited spectral information and a low-resolution image with abundant spectral data. Recently, deep learning (DL) techniques have shown significant effectiveness in this area. Most DL-based methods approach image fusion as a 2D problem by encoding spectral information into feature map channels. However, our research suggests that this strategy introduces notable spectral distortions. In contrast, some methods consider spectral data as an additional dimension, utilizing standard 3D convolutions to preserve spectral information. Nevertheless, in a standard 3D convolutional layer, the same set of kernels is applied across all input regions, which we have found to be sub-optimal for image fusion. Furthermore, standard 3D convolutions necessitate substantial computational resources. To address these challenges, we propose a novel convolutional paradigm called Adaptive 3D Convolution (Ada3D) for remote sensing image fusion. Ada3D applies a unique set of 3D kernels to each input voxel, enabling the capture of fine-grained details. These adaptive kernels are generated through a two-step process: (i) spatial and spectral kernels are derived from their respective image sources; (ii) these two types of kernels are then combined to form content-aware 3D kernels that effectively integrate spatial and spectral information. Additionally, adaptive biases are introduced to enhance the convolutional outcome at the voxel level. Furthermore, we incorporate the group convolution technique to reduce computational complexity. As a result, Ada3D offers full adaptivity in an efficient manner. Evaluation results across five datasets demonstrate that our method achieves SOTA performance, underscoring the superiority of Ada3D. The code is available at https://github.com/PSRben/Ada3D.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Ada3D adds a two-step spatial-spectral kernel merge plus per-voxel biases to make 3D conv content-aware for remote sensing fusion, but the SOTA edge needs checking to confirm the adaptation drives the gains rather than capacity.

read the letter

The main point is that this paper gives a concrete way to adapt 3D convolutions for image fusion by generating unique kernels per voxel. Spatial kernels come from the high-resolution input and spectral kernels from the low-resolution one; they get combined, then per-voxel biases are added. Group convolution keeps the cost down. This is presented as better than fixed 3D kernels, which the authors say apply the same weights everywhere and lose fine details or introduce spectral issues. The code release helps anyone who wants to inspect or rerun it. The work stays grounded in the practical constraints of remote sensing data, where preserving both spatial sharpness and spectral fidelity matters. That focus is a strength. The architecture description is clear enough that a reader can see how the two-step process is meant to integrate the two sources without flattening spectra into channels. The claim of full adaptivity in an efficient form follows logically from the design choices. The soft spot sits in the results section. The abstract states SOTA across five datasets, yet the provided description gives no specific metrics, baseline tables, or ablation isolating the two-step merge from simpler per-voxel scaling or extra parameters. If the measured gains trace mostly to higher capacity or training tweaks rather than the spatial-spectral split itself, the headline advantage of Ada3D shrinks. The concern that new artifacts could appear from the adaptive biases also needs direct evidence from the error maps or spectral angle metrics. Without those breakdowns, it is hard to separate the method from generic increases in flexibility. This paper is for people already working on pansharpening or hyperspectral fusion with deep networks. Someone running experiments on similar datasets could pull the code and test whether the kernel construction helps on their data. It is not aimed at readers looking for broad theoretical shifts in convolution design. I would send it to peer review. The idea is specific, the motivation is reasonable, multiple datasets are used, and code is public, so referees can evaluate the claims directly and ask for the missing ablations.

Referee Report

2 major / 0 minor

Summary. The paper proposes Adaptive 3D Convolution (Ada3D) for remote sensing image fusion to address spectral distortions in 2D DL methods and the sub-optimality of fixed kernels plus high compute in standard 3D convolutions. Ada3D generates per-voxel content-aware 3D kernels via a two-step process (spatial kernels from high-resolution source, spectral kernels from low-resolution source, then combined) plus per-voxel adaptive biases, incorporates group convolution for efficiency, and claims SOTA results on five datasets.

Significance. If the superiority claims hold with proper validation, Ada3D could provide a practical advance in handling spatial-spectral fusion by enabling local adaptivity without full per-voxel kernel computation, potentially reducing artifacts while maintaining efficiency. The public code release supports reproducibility.

major comments (2)

[Abstract] Abstract: The central claim that 'Evaluation results across five datasets demonstrate that our method achieves SOTA performance' is unsupported by any reported metrics, baselines, ablation tables, or error analysis, which is load-bearing for the superiority assertion and prevents verification of whether the two-step kernel combination (rather than capacity or training differences) drives the gains.
[Abstract] Abstract (method description): No quantitative comparison or ablation is described that isolates the effect of the two-step spatial-spectral kernel generation plus per-voxel biases against simpler alternatives such as standard 3D convolution, group-conv-only, or per-voxel adaptation without the combination step; this is required to substantiate that the specific construction is what improves integration over fixed kernels.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point-by-point below. The full manuscript contains the supporting experiments, but we agree the abstract can be strengthened for clarity and will revise accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that 'Evaluation results across five datasets demonstrate that our method achieves SOTA performance' is unsupported by any reported metrics, baselines, ablation tables, or error analysis, which is load-bearing for the superiority assertion and prevents verification of whether the two-step kernel combination (rather than capacity or training differences) drives the gains.

Authors: The abstract is a concise summary; the full paper reports the requested details in Section 4 (Experiments) and Section 5 (Ablations), with Tables 1–5 providing quantitative metrics (PSNR, SSIM, SAM, ERGAS), baseline comparisons, and error analysis across the five datasets. These tables directly support the SOTA claim and allow verification that gains arise from the Ada3D design rather than capacity alone. We will revise the abstract to include one or two key quantitative highlights (e.g., average PSNR improvement) to make the claim self-contained. revision: yes
Referee: [Abstract] Abstract (method description): No quantitative comparison or ablation is described that isolates the effect of the two-step spatial-spectral kernel generation plus per-voxel biases against simpler alternatives such as standard 3D convolution, group-conv-only, or per-voxel adaptation without the combination step; this is required to substantiate that the specific construction is what improves integration over fixed kernels.

Authors: The manuscript describes the two-step process in Section 3 and includes ablation studies in Section 5.2 that compare Ada3D against standard 3D convolution and group-convolution variants. We acknowledge that a more targeted ablation isolating the exact two-step combination plus adaptive biases (versus per-voxel adaptation without combination) is not explicitly presented. We will add this specific ablation experiment in the revision to directly quantify the contribution of the proposed kernel-generation step. revision: yes

Circularity Check

0 steps flagged

No circularity: Ada3D is a forward architectural proposal with empirical SOTA claims

full rationale

The paper introduces Ada3D as a new convolutional layer that generates per-voxel 3D kernels via a two-step spatial-spectral process plus adaptive biases, then reports empirical gains on five datasets. No equations, predictions, or first-principles derivations are present that reduce the claimed performance or kernel adaptivity to quantities defined by the method itself. The method is defined independently of its evaluation outcomes, and no self-citation chains or fitted-input renamings appear in the load-bearing steps. This is a standard empirical architecture paper whose central claim rests on experimental results rather than any self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on standard deep-learning assumptions that convolutional networks can learn effective fusion mappings and that the proposed kernel-generation process is a valid way to achieve content awareness. No explicit free parameters or invented physical entities are described in the abstract.

axioms (1)

domain assumption Convolutional neural networks can learn mappings that preserve spectral information when kernels are made spatially and spectrally adaptive.
Implicit foundation for claiming superiority of Ada3D over fixed-kernel 3D convolutions.

invented entities (1)

Adaptive 3D Convolution (Ada3D) no independent evidence
purpose: To apply a unique set of 3D kernels to each input voxel for remote sensing image fusion.
New convolutional operation introduced by the paper.

pith-pipeline@v0.9.0 · 5628 in / 1336 out tokens · 29032 ms · 2026-05-12T04:35:22.223070+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages

[1]

Pixel-adaptive convolutional neural networks,

H. Su, V . Jampani, D. Sun, O. Gallo, E. Learned-Miller, and J. Kautz, “Pixel-adaptive convolutional neural networks,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 11 158–11 167

work page 2019
[2]

D2conv3d: Dynamic dilated convolutions for object segmentation in videos,

C. Schmidt, A. Athar, S. Mahadevan, and B. Leibe, “D2conv3d: Dynamic dilated convolutions for object segmentation in videos,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), January 2022, pp. 1200–1209

work page 2022
[3]

Deep spatio- temporal adaptive 3d convolutional neural networks for traffic flow prediction,

H. Li, X. Li, L. Su, D. Jin, J. Huang, and D. Huang, “Deep spatio- temporal adaptive 3d convolutional neural networks for traffic flow prediction,”ACM Transactions on Intelligent Systems and Technology, vol. 13, no. 2, pp. 1–21, 2022

work page 2022
[4]

A new benchmark based on recent advances in multispectral pansharpening: Revisiting pansharpening with classical and emerging pansharpening methods,

G. Vivone, M. Dalla Mura, A. Garzelli, R. Restaino, G. Scarpa, M. O. Ulfarsson, L. Alparone, and J. Chanussot, “A new benchmark based on recent advances in multispectral pansharpening: Revisiting pansharpening with classical and emerging pansharpening methods,” IEEE Geoscience and Remote Sensing Magazine, vol. 9, no. 1, pp. 53– 81, 2021

work page 2021
[5]

Hyperspectral pansharpening: A review,

L. Loncan, L. B. de Almeida, J. M. Bioucas-Dias, X. Briottet, J. Chanus- sot, N. Dobigeon, S. Fabre, W. Liao, G. A. Licciardi, M. Sim ˜oes, J.- Y . Tourneret, M. A. Veganzones, G. Vivone, Q. Wei, and N. Yokoya, “Hyperspectral pansharpening: A review,”IEEE Geoscience and Remote Sensing Magazine, vol. 3, no. 3, pp. 27–46, 2015

work page 2015
[6]

Mtf- tailored multiscale fusion of high-resolution ms and pan imagery,

B. Aiazzi, L. Alparone, S. Baronti, A. Garzelli, and M. Selva, “Mtf- tailored multiscale fusion of high-resolution ms and pan imagery,” Photogrammetric Engineering & Remote Sensing, vol. 72, no. 5, pp. 591–596, 2006

work page 2006
[7]

A new adaptive component-substitution- based satellite image fusion by using partial replacement,

J. Choi, K. Yu, and Y . Kim, “A new adaptive component-substitution- based satellite image fusion by using partial replacement,”IEEE Trans- actions on Geoscience and Remote Sensing, 2010

work page 2010
[8]

Robust band-dependent spatial-detail approaches for panchromatic sharpening,

G. Vivone, “Robust band-dependent spatial-detail approaches for panchromatic sharpening,”IEEE Transactions on Geoscience and Re- mote Sensing, vol. 57, no. 9, pp. 6421–6433, 2019

work page 2019
[9]

Smoothing filter-based intensity modulation: A spectral preserve image fusion technique for improving spatial details,

J. Liu, “Smoothing filter-based intensity modulation: A spectral preserve image fusion technique for improving spatial details,”International Journal of Remote Sensing, vol. 21, no. 18, pp. 3461–3472, 2000

work page 2000
[10]

Contrast and error-based fusion schemes for multispectral image pansharpening,

G. Vivone, R. Restaino, M. Dalla Mura, G. Licciardi, and J. Chanus- sot, “Contrast and error-based fusion schemes for multispectral image pansharpening,”IEEE Geoscience and Remote Sensing Letters, vol. 11, no. 5, pp. 930–934, 2014

work page 2014
[11]

Full scale regression-based injection coefficients for panchromatic sharpening,

G. Vivone, R. Restaino, and J. Chanussot, “Full scale regression-based injection coefficients for panchromatic sharpening,”IEEE Transactions on Image Processing, vol. 27, no. 7, pp. 3418–3431, 2018

work page 2018
[12]

A new pansharpening algorithm based on total variation,

F. Palsson, J. R. Sveinsson, and M. O. Ulfarsson, “A new pansharpening algorithm based on total variation,”IEEE Geoscience and Remote Sensing Letters, vol. 11, no. 1, pp. 318–322, 2013

work page 2013
[13]

Pan-sharpening of multi-spectral images using a new variational model,

G. Zhang, F. Fang, A. Zhou, and F. Li, “Pan-sharpening of multi-spectral images using a new variational model,”International Journal of Remote Sensing, vol. 36, no. 5, pp. 1484–1508, 2015

work page 2015
[14]

High-quality bayesian pan- sharpening,

T. Wang, F. Fang, F. Li, and G. Zhang, “High-quality bayesian pan- sharpening,”IEEE Transactions on Image Processing, vol. 28, no. 1, pp. 227–239, 2019

work page 2019
[15]

An efficient pansharpening approach based on texture correction and detail refinement,

H. Lu, Y . Yang, S. Huang, and W. Tu, “An efficient pansharpening approach based on texture correction and detail refinement,”IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022

work page 2022
[16]

Pansharpening based on variational fractional-order geometry model and optimized injection gains,

Y . Yang, H. Lu, S. Huang, W. Wan, and L. Li, “Pansharpening based on variational fractional-order geometry model and optimized injection gains,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 2128–2141, 2022

work page 2022
[17]

Pansharpening by convolutional neural networks,

M. Giuseppe, C. Davide, V . Luisa, and S. Giuseppe, “Pansharpening by convolutional neural networks,”Remote Sensing, vol. 8, no. 7, p. 594, 2016

work page 2016
[18]

Pannet: A deep network architecture for pan-sharpening,

J. Yang, X. Fu, Y . Hu, Y . Huang, X. Ding, and J. Paisley, “Pannet: A deep network architecture for pan-sharpening,” inProceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1753–1761

work page 2017
[19]

Hyperpnn: Hyperspectral pansharpening via spectrally predictive convolutional neural networks,

L. He, J. Zhu, J. Li, A. Plaza, J. Chanussot, and B. Li, “Hyperpnn: Hyperspectral pansharpening via spectrally predictive convolutional neural networks,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 12, no. 8, pp. 3092–3100, 2019

work page 2019
[20]

Deep multiscale detail networks for multiband spectral image sharpening,

X. Fu, W. Wang, Y . Huang, X. Ding, and J. Paisley, “Deep multiscale detail networks for multiband spectral image sharpening,”IEEE Trans- actions on Neural Networks and Learning Systems, vol. 32, no. 5, pp. 2090–2104, 2021

work page 2090
[21]

Detail injection- based deep convolutional neural networks for pansharpening,

L.-J. Deng, G. Vivone, C. Jin, and J. Chanussot, “Detail injection- based deep convolutional neural networks for pansharpening,”IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 8, pp. 6995–7010, 2021

work page 2021
[22]

D2tnet: A convlstm network with dual-direction transfer for pan-sharpening,

M. Gong, J. Ma, H. Xu, X. Tian, and X.-P. Zhang, “D2tnet: A convlstm network with dual-direction transfer for pan-sharpening,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–14, 2022

work page 2022
[23]

Zero-shot hyperspectral sharpening,

R. Dian, A. Guo, and S. Li, “Zero-shot hyperspectral sharpening,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 10, pp. 12 650–12 666, 2023

work page 2023
[24]

Pansharpening via frequency-aware fusion network with explicit similarity constraints,

Y . Xing, Y . Zhang, H. He, X. Zhang, and Y . Zhang, “Pansharpening via frequency-aware fusion network with explicit similarity constraints,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1– 14, 2023

work page 2023
[25]

U2net: A general framework with spatial-spectral-integrated double u-net for image fusion,

S. Peng, C. Guo, X. Wu, and L.-J. Deng, “U2net: A general framework with spatial-spectral-integrated double u-net for image fusion,” inPro- ceedings of the 31st ACM International Conference on Multimedia (ACM MM). New York, NY , USA: Association for Computing Machinery, 2023, p. 3219–3227

work page 2023
[26]

Multispectral image pan-sharpening guided by component substitution model,

H. Gao, S. Li, J. Li, and R. Dian, “Multispectral image pan-sharpening guided by component substitution model,”IEEE Transactions on Geo- science and Remote Sensing, vol. 61, pp. 1–13, 2023

work page 2023
[27]

Paps: Progressive attention- based pan-sharpening,

Y . Jia, Q. Hu, R. Dian, J. Ma, and X. Guo, “Paps: Progressive attention- based pan-sharpening,”IEEE/CAA Journal of Automatica Sinica, vol. 11, no. 2, pp. 391–404, 2024

work page 2024
[28]

Learning a deep convolu- tional network for image super-resolution,

C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolu- tional network for image super-resolution,” inEuropean Conference on Computer Vision (ECCV), 2014, pp. 184–199

work page 2014
[29]

Rain o’er me: Synthesizing real rain to derain with data distillation,

H. Lin, Y . Li, X. Fu, X. Ding, Y . Huang, and J. Paisley, “Rain o’er me: Synthesizing real rain to derain with data distillation,”IEEE Transactions on Image Processing, vol. 29, pp. 7668–7680, 2020

work page 2020
[30]

Cooperated spectral low-rankness prior and deep spatial prior for hsi unsupervised denoising,

Q. Zhang, Q. Yuan, M. Song, H. Yu, and L. Zhang, “Cooperated spectral low-rankness prior and deep spatial prior for hsi unsupervised denoising,”IEEE Transactions on Image Processing, vol. 31, pp. 6356– 6368, 2022

work page 2022
[31]

Ttst: A top- k token selective transformer for remote sensing image super-resolution,

Y . Xiao, Q. Yuan, K. Jiang, J. He, C.-W. Lin, and L. Zhang, “Ttst: A top- k token selective transformer for remote sensing image super-resolution,” IEEE Transactions on Image Processing, vol. 33, pp. 738–752, 2024

work page 2024
[32]

Dynamic filter networks,

X. Jia, B. De Brabandere, T. Tuytelaars, and L. V . Gool, “Dynamic filter networks,” inAdvances in neural information processing systems (NeurIPS), vol. 29, 2016, pp. 667–675

work page 2016
[33]

Adaptive convolutional kernels,

J. Zamora Esquivel, A. Cruz Vargas, P. Lopez Meyer, and O. Tickoo, “Adaptive convolutional kernels,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2019, pp. 1998–2005

work page 2019
[34]

Decoupled dynamic filter networks,

J. Zhou, V . Jampani, Z. Pi, Q. Liu, and M.-H. Yang, “Decoupled dynamic filter networks,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 6643– 6652

work page 2021
[35]

Lagconv: Local-context adaptive convolution kernels with global harmonic bias for pansharpening,

Z.-R. Jin, T.-J. Zhang, T.-X. Jiang, G. Vivone, and L.-J. Deng, “Lagconv: Local-context adaptive convolution kernels with global harmonic bias for pansharpening,” inProceedings of the AAAI conference on artificial intelligence (AAAI), vol. 36, 2022, pp. 1113–1121

work page 2022
[36]

Source-adaptive dis- criminative kernels based network for remote sensing pansharpening,

S. Peng, L.-J. Deng, J.-F. Hu, and Y . Zhuo, “Source-adaptive dis- criminative kernels based network for remote sensing pansharpening,” inProceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI), 2022, pp. 1283–1289

work page 2022
[37]

Content-adaptive non-local convolution for remote sensing pansharpening,

Y . Duan, X. Wu, H. Deng, and L.-J. Deng, “Content-adaptive non-local convolution for remote sensing pansharpening,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 27 738–27 747

work page 2024
[38]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778

work page 2016
[39]

Multiscale feature fusion for hyperspectral image classification using hybrid 3d-2d depthwise separable convolution networks,

H. Firat, H. C ¸ i ˘g, M. T. G ¨ull¨uo˘glu, M. E. Asker, and D. Hanbay, “Multiscale feature fusion for hyperspectral image classification using hybrid 3d-2d depthwise separable convolution networks,”Traitement du Signal, vol. 40, no. 5, p. 1921, 2023

work page 1921
[40]

Hybrid 3d/2d complete inception module and convolutional neural network for hyper- spectral remote sensing image classification,

H. Fırat, M. E. Asker, M. ˙I. Bayındır, and D. Hanbay, “Hybrid 3d/2d complete inception module and convolutional neural network for hyper- spectral remote sensing image classification,”Neural Processing Letters, vol. 55, no. 2, pp. 1087–1130, 2023

work page 2023
[41]

3d residual spatial–spectral convolution network for hyperspectral remote sensing image classification,

H. Firat, M. E. Asker, M. I. Bayindir, and D. Hanbay, “3d residual spatial–spectral convolution network for hyperspectral remote sensing image classification,”Neural Computing and Applications, vol. 35, no. 6, pp. 4479–4497, 2023. 14

work page 2023
[42]

Dbanet: Dual-branch attention network for hyperspectral remote sensing image classification,

Z. Li, G. Chen, G. Li, L. Zhou, X. Pan, W. Zhao, and W. Zhang, “Dbanet: Dual-branch attention network for hyperspectral remote sensing image classification,”Computers and Electrical Engineering, vol. 118, p. 109269, 2024

work page 2024
[43]

Hybrid fusionnet: A hybrid feature fusion framework for multisource high-resolution remote sensing image classification,

Y . Zheng, S. Liu, H. Chen, and L. Bruzzone, “Hybrid fusionnet: A hybrid feature fusion framework for multisource high-resolution remote sensing image classification,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–14, 2024

work page 2024
[44]

Deep pansharpening via 3d spectral super- resolution network and discrepancy-based gradient transfer,

H. Su, H. Jin, and C. Sun, “Deep pansharpening via 3d spectral super- resolution network and discrepancy-based gradient transfer,”Remote Sensing, vol. 14, no. 17, 2022

work page 2022
[45]

Msac-net: 3d multi-scale attention convolutional network for multi-spectral imagery pansharpening,

E. Zhang, Y . Fu, J. Wang, L. Liu, K. Yu, and J. Peng, “Msac-net: 3d multi-scale attention convolutional network for multi-spectral imagery pansharpening,”Remote Sensing, vol. 14, no. 12, p. 2761, 2022

work page 2022
[46]

Pan-mamba: Effective pan-sharpening with state space model,

X. He, K. Cao, K. Yan, R. Li, C. Xie, J. Zhang, and M. Zhou, “Pan-mamba: Effective pan-sharpening with state space model,”arXiv preprint arXiv:2402.12192, 2024

work page arXiv 2024
[47]

A deep-shallow fusion network with multidetail extractor and spectral attention for hyperspectral pansharpening,

Y .-W. Zhuo, T.-J. Zhang, J.-F. Hu, H.-X. Dou, T.-Z. Huang, and L.-J. Deng, “A deep-shallow fusion network with multidetail extractor and spectral attention for hyperspectral pansharpening,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 7539–7555, 2022

work page 2022
[48]

A critical comparison among pansharpening algorithms,

G. Vivone, L. Alparone, J. Chanussot, M. Dalla Mura, A. Garzelli, G. A. Licciardi, R. Restaino, and L. Wald, “A critical comparison among pansharpening algorithms,”IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 5, pp. 2565–2586, 2015

work page 2015
[49]

Dynamic filtering with large sampling field for convnets,

J. Wu, D. Li, Y . Yang, C. Bajaj, and X. Ji, “Dynamic filtering with large sampling field for convnets,” inProceedings of the European Conference on Computer Vision (ECCV), September 2018, pp. 185–200

work page 2018
[50]

Coupled non-negative matrix factorization (cnmf) for hyperspectral and multispectral data fusion: Application to pasture classification,

N. Yokoya, T. Yairi, and A. Iwasaki, “Coupled non-negative matrix factorization (cnmf) for hyperspectral and multispectral data fusion: Application to pasture classification,” in2011 IEEE International Geo- science and Remote Sensing Symposium (IGARSS), 2011, pp. 1779– 1782

work page 2011
[51]

A convex formulation for hyperspectral image superresolution via subspace-based regularization,

M. Sim ˜oes, J. Bioucas-Dias, L. B. Almeida, and J. Chanussot, “A convex formulation for hyperspectral image superresolution via subspace-based regularization,”IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 6, pp. 3373–3388, 2015

work page 2015
[52]

Spectral- fidelity convolutional neural networks for hyperspectral pansharpening,

L. He, J. Zhu, J. Li, D. Meng, J. Chanussot, and A. Plaza, “Spectral- fidelity convolutional neural networks for hyperspectral pansharpening,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 13, pp. 5898–5914, 2020

work page 2020
[53]

Feature pyramid fusion network for hyperspectral pansharpening,

W. Dong, Y . Yang, J. Qu, Y . Li, Y . Yang, and X. Jia, “Feature pyramid fusion network for hyperspectral pansharpening,”IEEE Transactions on Neural Networks and Learning Systems, pp. 1–13, 2023

work page 2023
[54]

A general adaptive dual-level weighting mechanism for remote sensing pansharpening,

J. Huang, H. Chen, J. Ren, S. Peng, and L. Deng, “A general adaptive dual-level weighting mechanism for remote sensing pansharpening,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2025, pp. 7447–7456

work page 2025
[55]

Dfcfn: Dual-stage feature correction fusion network for hyperspectral pansharpening,

X. Wang, Y . Yang, S. Huang, W. Wan, Z. Liu, L. Zhang, and A. Zhao, “Dfcfn: Dual-stage feature correction fusion network for hyperspectral pansharpening,”IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1–14, 2025

work page 2025
[56]

Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images,

L. Wald, T. Ranchin, and M. Mangolini, “Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images,” Photogrammetric Engineering and Remote Sensing, vol. 63, pp. 691– 699, 1997

work page 1997
[57]

Multi- scale-and-depth convolutional neural network for remote sensed imagery pan-sharpening,

Y . Wei, Q. Yuan, X. Meng, H. Shen, L. Zhang, and M. Ng, “Multi- scale-and-depth convolutional neural network for remote sensed imagery pan-sharpening,” in2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2017, pp. 3413–3416

work page 2017
[58]

Pan-sharpening using an efficient bidirectional pyramid network,

Y . Zhang, C. Liu, M. Sun, and Y . Ou, “Pan-sharpening using an efficient bidirectional pyramid network,”IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 8, pp. 5549–5563, 2019

work page 2019
[59]

Ssconv: Explicit spectral- to-spatial convolution for pansharpening,

Y . Wang, L.-J. Deng, T.-J. Zhang, and X. Wu, “Ssconv: Explicit spectral- to-spatial convolution for pansharpening,” inProceedings of the 29th ACM International Conference on Multimedia (ACM MM). New York, NY , USA: Association for Computing Machinery, 2021, p. 4472–4480

work page 2021
[60]

Pmacnet: Parallel multiscale attention constraint network for pan-sharpening,

Y . Liang, P. Zhang, Y . Mei, and T. Wang, “Pmacnet: Parallel multiscale attention constraint network for pan-sharpening,”IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022

work page 2022
[61]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,”IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004

work page 2004
[62]

Discrimination among semi-arid landscape endmembers using the spectral angle mapper (sam) algorithm,

R. H. Yuhas, A. F. Goetz, and J. W. Boardman, “Discrimination among semi-arid landscape endmembers using the spectral angle mapper (sam) algorithm,” inJPL, Summaries of the Third Annual JPL Airborne Geoscience Workshop. Volume 1: AVIRIS Workshop, 1992

work page 1992
[63]

Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,

K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), December 2015

work page 2015
[64]

Machine learning in pansharpening: A benchmark, from shallow to deep networks,

L.-J. Deng, G. Vivone, M. E. Paoletti, G. Scarpa, J. He, Y . Zhang, J. Chanussot, and A. Plaza, “Machine learning in pansharpening: A benchmark, from shallow to deep networks,”IEEE Geoscience and Remote Sensing Magazine, vol. 10, no. 3, pp. 279–315, 2022

work page 2022
[65]

Hypercomplex quality assessment of multi/hyperspectral images,

A. Garzelli and F. Nencini, “Hypercomplex quality assessment of multi/hyperspectral images,”IEEE Geoscience and Remote Sensing Letters, vol. 6, no. 4, pp. 662–665, 2009

work page 2009
[66]

Remote sensing image fusion based on two-stream fusion network,

X. Liu, Q. Liu, and Y . Wang, “Remote sensing image fusion based on two-stream fusion network,”Information Fusion, vol. 55, pp. 1–15, 2020

work page 2020
[67]

Ssr-net: Spatial–spectral re- construction network for hyperspectral and multispectral image fusion,

X. Zhang, W. Huang, Q. Wang, and X. Li, “Ssr-net: Spatial–spectral re- construction network for hyperspectral and multispectral image fusion,” IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 7, pp. 5953–5965, 2021

work page 2021
[68]

Model-guided deep hyperspectral image super-resolution,

W. Dong, C. Zhou, F. Wu, J. Wu, G. Shi, and X. Li, “Model-guided deep hyperspectral image super-resolution,”IEEE Transactions on Image Processing, vol. 30, pp. 5754–5768, 2021

work page 2021
[69]

Fusformer: A transformer-based fusion network for hyperspectral im- age super-resolution,

J.-F. Hu, T.-Z. Huang, L.-J. Deng, H.-X. Dou, D. Hong, and G. Vivone, “Fusformer: A transformer-based fusion network for hyperspectral im- age super-resolution,”IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022

work page 2022
[70]

Learning a 3d-cnn and transformer prior for hyperspectral image super-resolution,

Q. Ma, J. Jiang, X. Liu, and J. Ma, “Learning a 3d-cnn and transformer prior for hyperspectral image super-resolution,”Information Fusion, vol. 100, p. 101907, 2023

work page 2023
[71]

Hyperspectral image super- resolution meets deep learning: A survey and perspective,

X. Wang, Q. Hu, Y . Cheng, and J. Ma, “Hyperspectral image super- resolution meets deep learning: A survey and perspective,”IEEE/CAA Journal of Automatica Sinica, vol. 10, no. 8, pp. 1668–1691, 2023

work page 2023
[72]

Generalized assorted pixel camera: Postcapture control of resolution, dynamic range, and spectrum,

F. Yasuma, T. Mitsunaga, D. Iso, and S. K. Nayar, “Generalized assorted pixel camera: Postcapture control of resolution, dynamic range, and spectrum,”IEEE Transactions on Image Processing, vol. 19, no. 9, pp. 2241–2253, 2010. 15 Supplementary Material I. SUPPLEMENTARYEXPLANATION FORREMARK1 First, we extend Definition 2 into matrix space as follows. Lemma ...

work page 2010