Multigrain-aware Semantic Prototype Scanning and Tri-Token Prompt Learning Embraced High-Order RWKV for Pan-Sharpening

Jianhou Gan; Junfeng Li; Wenqi Ren; Wenyang Zhou; Xuanhua He; Xueheng Li

arxiv: 2604.14622 · v1 · submitted 2026-04-16 · 💻 cs.CV

Multigrain-aware Semantic Prototype Scanning and Tri-Token Prompt Learning Embraced High-Order RWKV for Pan-Sharpening

Junfeng Li , Wenyang Zhou , Xueheng Li , Xuanhua He , Jianhou Gan , Wenqi Ren This is my paper

Pith reviewed 2026-05-10 11:31 UTC · model grok-4.3

classification 💻 cs.CV

keywords pan-sharpeningRWKVsemantic prototype scanningtoken promptingimage fusionremote sensingartifact suppressionlocality-sensitive hashing

0 comments

The pith

A multigrain semantic prototype scanning strategy with tri-token prompting in high-order RWKV produces superior pan-sharpening by enabling coherent global interactions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a pan-sharpening method that adapts high-order RWKV to respect semantic image structure instead of fixed positional order. It uses locality-sensitive hashing to form multigrain semantic prototypes that reorder tokens for context-aware processing. A tri-token prompting setup supplies global and prototype priors while a register token curbs noise, and an invertible Q-shift injects high-frequency details without parameter bloat. The approach targets the fusion of panchromatic and multispectral satellite data where semantic coherence reduces artifacts. Experimental comparisons show gains over prior RWKV and transformer baselines.

Core claim

The method introduces multigrain-aware semantic prototype scanning that leverages locality-sensitive hashing to group semantically related regions into multi-grain prototypes for context-aware token reordering; tri-token prompt learning that combines a global token, cluster-derived prototype tokens, and a learnable register token to supply semantic priors and suppress noisy representations; and an invertible Q-shift that applies center difference convolution plus multi-scale operations for lossless high-frequency feature transformation. Together these components allow high-order RWKV to achieve more coherent global interaction and fewer artifacts during pan-sharpening.

What carries the argument

Multigrain-aware semantic prototype scanning that reorders tokens via locality-sensitive hashing of semantically related regions, paired with tri-token prompting consisting of global, prototype, and register tokens.

If this is right

Context-aware token reordering produces more coherent global modeling inside linear-complexity sequence architectures.
The register token reduces artifact-prone intermediate features during image reconstruction.
Invertible multi-scale Q-shift supplies high-frequency content without expanding receptive fields through extra parameters.
The overall pipeline yields measurable gains in pan-sharpening quality over conventional raster-order RWKV.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same semantic grouping and prompting pattern could apply to other dense prediction tasks that suffer from raster-order bias.
Register tokens for noise suppression may prove useful in any efficient vision model that processes long image sequences.
Lossless invertible shifts offer a general route to preserve detail when scaling linear models to higher resolutions.

Load-bearing premise

That locality-sensitive hashing will reliably form semantically meaningful multigrain prototypes that improve fusion without adding bias or losing spatial information.

What would settle it

On standard pan-sharpening benchmarks such as WorldView or QuickBird, if the method fails to exceed baseline RWKV and transformer results in PSNR, SSIM, or visual artifact reduction, the claim of coherent interaction and superiority would not hold.

Figures

Figures reproduced from arXiv: 2604.14622 by Jianhou Gan, Junfeng Li, Wenqi Ren, Wenyang Zhou, Xuanhua He, Xueheng Li.

**Figure 1.** Figure 1: Comparison between the conventional Transformer, Vision RWKV, and our proposed Multigrain-aware Semantic Prototype [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: The Multigrain-aware Semantic Prototype Scanning architecture. It replaces standard recurrent scanning with a semantic-driven [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Semantic-guided scanning MTRWKV. It replaces the standard raster scan with a semantic order derived from feature clustering, [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: WKV-moment/sharing mechanism. where Wr and br are learnable parameters. (3) Semanticaware Processing. The enhanced token sequence undergoes semantic-aware transformation Venh s = [VI s ; P1, . . . , PC ; g; r], (13) wkv = Bi-WKV(Ks, Venh s ), (14) Os = Wo(σ(Rs) ⊙ wkv), (15) where Wo denotes the output projection. (4) Hierarchical Feature Integration The processed outputs are decomposed and integrated thro… view at source ↗

**Figure 6.** Figure 6: Visual comparison evidenced by the MSE residue map between ground truth and prediction. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Feature analysis of prompt-learning tokens and model [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

read the original abstract

In this work, we propose a Multigrain-aware Semantic Prototype Scanning paradigm for pan-sharpening, built upon a high-order RWKV architecture and a tri-token prompting mechanism derived from semantic clustering. Specifically, our method contains three key components: 1) Multigrain-aware Semantic Prototype Scanning. Although RWKV offers a efficient linear-complexity alternative to Transformers, its conventional bidirectional raster scanning is still semantic-agnostic and prone to positional bias. To address this issue, we introduce a semantic-driven scanning strategy that leverages locality-sensitive hashing to group semantically related regions and construct multi-grain semantic prototypes, enabling context-aware token reordering and more coherent global interaction. 2) Tri-token Prompt Learning. We design a tri-token prompting mechanism consisting of a global token, cluster-derived prototype tokens, and a learnable register token. The global and prototype tokens provide complementary semantic priors for RWKV modeling, while the register token helps suppress noisy and artifact-prone intermediate representations. 3) Invertible Q-Shift. To counteract spatial details, we apply center difference convolution on the value pathway to inject high-frequency information, and introduce an invertible multi-scale Q-shift operation for efficient and lossless feature transformation without parameter-heavy receptive field expansion. Experimental results demonstrate the superiority of our method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper puts together LSH-based multigrain scanning, tri-token prompts, and an invertible Q-shift inside high-order RWKV for pan-sharpening, but the abstract asserts superiority without any numbers or baselines to check it.

read the letter

The main thing to know is that this work replaces RWKV's usual raster scan with a semantic-driven reordering that uses locality-sensitive hashing to build multi-grain prototypes, then feeds those into a tri-token prompt setup (global, prototype, and register tokens) plus a center-difference invertible Q-shift. The stated goal is more coherent global context and fewer artifacts in pan-sharpening while keeping linear complexity. That combination of pieces is not in the prior literature the abstract cites, so the architecture itself is new on the surface.

Referee Report

3 major / 3 minor

Summary. The paper proposes a Multigrain-aware Semantic Prototype Scanning paradigm for pan-sharpening built on high-order RWKV. It introduces three components: (1) semantic-driven scanning via locality-sensitive hashing to form multi-grain semantic prototypes for context-aware token reordering instead of raster order; (2) tri-token prompting with a global token, cluster-derived prototype tokens, and a learnable register token to supply semantic priors and suppress noisy representations; (3) an invertible Q-shift using center-difference convolution on the value path plus multi-scale shift for lossless high-frequency injection. The central claim is that this yields superior pan-sharpening performance.

Significance. If the experimental superiority holds after rigorous validation, the work could advance efficient linear-complexity alternatives to transformers for remote-sensing image fusion by adding semantic awareness to RWKV scanning while preserving details through the claimed lossless Q-shift. The explicit design of an invertible operation is a methodological strength that could be reusable if shown to be parameter-light and truly lossless.

major comments (3)

[Abstract] Abstract: the central claim that 'Experimental results demonstrate the superiority of our method' is unsupported by any metrics, datasets, baselines, or error analysis in the provided text. This is load-bearing for the paper's contribution and requires the full Experiments section (including tables of PSNR/SSIM on standard benchmarks such as WorldView-3 or GaoFen) plus statistical significance tests to be evaluated.
[Method (Tri-token Prompt Learning)] Method description of Tri-token Prompt Learning: the register token is asserted to 'suppress noisy and artifact-prone intermediate representations' without discarding useful high-frequency details, yet this rests on the ad-hoc axiom listed in the ledger with no derivation or ablation isolating its effect. An ablation removing the register token (and reporting the resulting artifact metrics) is needed to confirm it does not introduce new biases.
[Method (Multigrain-aware Semantic Prototype Scanning)] Method description of Multigrain-aware Semantic Prototype Scanning: the number of multigrain levels, semantic prototypes, and LSH parameters are free parameters whose tuning is not shown to be independent of the target datasets. Without an ablation or sensitivity analysis in the Experiments section, the claim that LSH-based reordering enables 'more coherent global interaction' without positional bias remains circular.

minor comments (3)

[Method] The notation for the 'Invertible Q-Shift' (center-difference convolution and multi-scale shift) is introduced without an equation or diagram; a formal definition and a small proof sketch of invertibility would improve clarity.
[Abstract and Introduction] The abstract and method text introduce several new terms ('multigrain semantic prototypes', 'tri-token prompting', 'Invertible Q-Shift') without explicit comparison to prior RWKV vision adaptations or pan-sharpening works that already use semantic clustering or register tokens.
Figure captions and implementation details (e.g., exact dimensions of the three token types, initialization scheme, and shift parameters) are missing from the provided text and should be added for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for providing detailed and constructive feedback on our manuscript. Below, we respond to each major comment in turn, explaining our position and the changes we have made or will make to the paper.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'Experimental results demonstrate the superiority of our method' is unsupported by any metrics, datasets, baselines, or error analysis in the provided text. This is load-bearing for the paper's contribution and requires the full Experiments section (including tables of PSNR/SSIM on standard benchmarks such as WorldView-3 or GaoFen) plus statistical significance tests to be evaluated.

Authors: The abstract serves as a concise summary of the work, while the full manuscript contains a complete Experiments section with quantitative results. This section reports PSNR and SSIM values on standard benchmarks including WorldView-3 and GaoFen, along with comparisons to multiple baselines and visual analyses. To better support the abstract claim, we will revise the abstract to include a brief summary of the key quantitative gains. We will also incorporate statistical significance tests (such as paired t-tests across multiple runs) into the Experiments section of the revised manuscript. revision: yes
Referee: [Method (Tri-token Prompt Learning)] Method description of Tri-token Prompt Learning: the register token is asserted to 'suppress noisy and artifact-prone intermediate representations' without discarding useful high-frequency details, yet this rests on the ad-hoc axiom listed in the ledger with no derivation or ablation isolating its effect. An ablation removing the register token (and reporting the resulting artifact metrics) is needed to confirm it does not introduce new biases.

Authors: We thank the referee for highlighting the need for explicit validation of the register token. The token is introduced to buffer noisy features within the tri-token prompting design. In the revised manuscript, we will add an ablation study that removes the register token and reports the resulting performance, including artifact-sensitive metrics such as edge preservation and spatial correlation coefficients. This will empirically show that the token improves noise suppression while preserving high-frequency details, without introducing new biases. revision: yes
Referee: [Method (Multigrain-aware Semantic Prototype Scanning)] Method description of Multigrain-aware Semantic Prototype Scanning: the number of multigrain levels, semantic prototypes, and LSH parameters are free parameters whose tuning is not shown to be independent of the target datasets. Without an ablation or sensitivity analysis in the Experiments section, the claim that LSH-based reordering enables 'more coherent global interaction' without positional bias remains circular.

Authors: We agree that sensitivity to these design choices must be demonstrated to avoid circular reasoning. The revised manuscript will include a dedicated sensitivity analysis in the Experiments section. We will report results when varying the number of multigrain levels, the number of semantic prototypes, and key LSH parameters (e.g., hash functions and bucket sizes) across the same datasets. The analysis will show that performance gains from semantic-driven reordering remain consistent and are not tied to dataset-specific tuning, thereby supporting the reduction in positional bias. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper proposes an architectural extension to RWKV for pan-sharpening via three explicit design components (LSH-based multigrain semantic scanning, tri-token prompting with global/prototype/register tokens, and invertible Q-shift via center-difference convolution). These are motivated as remedies for identified limitations of raster-order RWKV (semantic-agnosticism and positional bias) and are validated by experimental superiority rather than any closed-form derivation or first-principles prediction. No equations or steps in the abstract or described method reduce by construction to fitted parameters renamed as predictions, self-citations, or self-definitional loops; the central claim remains an empirical demonstration of a new model, which is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 3 invented entities

The central claim rests on several ad-hoc design choices and domain assumptions about semantic grouping and token utility that lack independent verification in the provided abstract.

free parameters (2)

number of multigrain levels and semantic prototypes
Determines the granularity of LSH-based grouping and prototype construction; chosen to enable context-aware reordering.
dimensions and initialization of global, prototype, and register tokens
Affect the tri-token prompting mechanism and are standard learnable parameters in the model.

axioms (2)

domain assumption Locality-sensitive hashing can reliably group semantically related image regions for coherent token reordering in RWKV.
Invoked directly in the description of the multigrain-aware semantic prototype scanning component.
ad hoc to paper The register token can suppress noisy intermediate representations without discarding useful high-frequency details.
Part of the tri-token prompt learning design to counteract artifacts.

invented entities (3)

Multigrain semantic prototypes no independent evidence
purpose: Enable semantic-driven, context-aware token reordering to overcome positional bias in standard RWKV scanning.
New construct introduced to address semantic-agnostic limitations of bidirectional raster scanning.
Tri-token prompting mechanism no independent evidence
purpose: Provide complementary semantic priors from global and cluster tokens while using the register token to reduce noise.
Novel prompting design tailored to the RWKV architecture for this task.
Invertible Q-Shift operation no independent evidence
purpose: Inject high-frequency spatial details via center difference convolution and perform lossless multi-scale feature transformation.
Proposed to counteract loss of spatial information without heavy parameter expansion.

pith-pipeline@v0.9.0 · 5551 in / 1586 out tokens · 132160 ms · 2026-05-10T11:31:04.953104+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

[1]

Im- proving component substitution pansharpening through mul- tivariate regression of ms + pan data.IEEE Transactions on Geoscience and Remote Sensing, 45(10):3230–3239, 2007

Bruno Aiazzi, Stefano Baronti, and Massimo Selva. Im- proving component substitution pansharpening through mul- tivariate regression of ms + pan data.IEEE Transactions on Geoscience and Remote Sensing, 45(10):3230–3239, 2007. 3

work page 2007
[2]

Super-resolution-guided progres- sive pansharpening based on a deep convolutional neural net- work.IEEE Transactions on Geoscience and Remote Sensing, 59(6):5206–5220, 2020

Jiajun Cai and Bo Huang. Super-resolution-guided progres- sive pansharpening based on a deep convolutional neural net- work.IEEE Transactions on Geoscience and Remote Sensing, 59(6):5206–5220, 2020. 3, 7

work page 2020
[3]

Wjoseph Carper, Thomasm Lillesand, and Ralphw Kiefer. The use of intensity-hue-saturation transformations for merg- ing spot panchromatic and multispectral image data.Pho- togrammetric Engineering and remote sensing, 56(4):459– 467, 1990. 3

work page 1990
[4]

Sirf: Simultaneous satellite image registration and fusion in a uni- fied framework.IEEE Transactions on Image Processing, 24 (11):4213–4224, 2015

Chen Chen, Yeqing Li, Wei Liu, and Junzhou Huang. Sirf: Simultaneous satellite image registration and fusion in a uni- fied framework.IEEE Transactions on Image Processing, 24 (11):4213–4224, 2015. 3

work page 2015
[5]

Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures, 2025

Yuchen Duan, Weiyun Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Hongsheng Li, Jifeng Dai, and Wenhai Wang. Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures, 2025. 4

work page 2025
[6]

A deep convolutional encoder-decoder-restorer architecture for image deblurring.Neural Processing Letters, 56(1):27, 2024

Yiqing Fan, Chaoqun Hong, Guanghui Zeng, and Lijuan Liu. A deep convolutional encoder-decoder-restorer architecture for image deblurring.Neural Processing Letters, 56(1):27, 2024

work page 2024
[7]

A variational pan-sharpening with local gradient constraints

Xueyang Fu, Zihuang Lin, Yue Huang, and Xinghao Ding. A variational pan-sharpening with local gradient constraints. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10265–10274, 2019. 3

work page 2019
[8]

Nonlinear ihs: A promising method for pan-sharpening.IEEE Geoscience and Remote Sensing Letters, 13(11):1606–1610, 2016

Morteza Ghahremani and Hassan Ghassemian. Nonlinear ihs: A promising method for pan-sharpening.IEEE Geoscience and Remote Sensing Letters, 13(11):1606–1610, 2016. 3

work page 2016
[9]

A. R. Gillespie, A. B. Kahle, and R. E. Walker. Color en- hancement of highly correlated images. ii. channel ratio and ”chromaticity” transformation techniques - sciencedirect.Re- mote Sensing of Environment, 22(3):343–365, 1987. 7

work page 1987
[10]

Color enhancement of highly correlated images

Alan R Gillespie, Anne B Kahle, and Richard E Walker. Color enhancement of highly correlated images. ii. channel ratio and ”chromaticity” transformation techniques.Remote Sensing of Environment, 22(3):343–365, 1987. 3

work page 1987
[11]

Haydn, G

R. Haydn, G. W. Dalke, J. Henkel, and J. E. Bare. Application of the ihs color transform to the processing of multisensor data and image enhancement.National Academy of Sciences of the United States of America, 79(13):571–577, 1982. 7

work page 1982
[12]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 3

work page 2016
[13]

Process for enhancing the spatial resolution of multispectral imagery using pan- sharpening, 2000

Craig A Laben and Bernard V Brower. Process for enhancing the spatial resolution of multispectral imagery using pan- sharpening, 2000. US Patent 6,011,875. 7

work page 2000
[14]

Image reconstruction of compressed sensing mri using graph-based redundant wavelet transform.Medical Image Analysis, 27:93–104, 2016

Zongying Lai, Xiaobo Qu, Yunsong Liu, Di Guo, Jing Ye, Zhifang Zhan, and Zhong Chen. Image reconstruction of compressed sensing mri using graph-based redundant wavelet transform.Medical Image Analysis, 27:93–104, 2016

work page 2016
[15]

Two-stage fusion of thermal hyperspectral and visible rgb image by pca and guided filter

Wenzhi Liao, Xin Huang, Frieke Van Coillie, Guy Thoonen, Aleksandra Piˇzurica, Paul Scheunders, and Wilfried Philips. Two-stage fusion of thermal hyperspectral and visible rgb image by pca and guided filter. In2015 7th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), pages 1–4. Ieee, 2015. 7

work page 2015
[16]

J. G. Liu. Smoothing filter-based intensity modulation: A spectral preserve image fusion technique for improving spatial details.International Journal of Remote Sensing, 21(18): 3461–3472, 2000. 7

work page 2000
[17]

Pansharpening by convolutional neural networks.Remote Sensing, 8(7):594, 2016

Giuseppe Masi, Davide Cozzolino, Luisa Verdoliva, and Giuseppe Scarpa. Pansharpening by convolutional neural networks.Remote Sensing, 8(7):594, 2016. 3, 7

work page 2016
[18]

Enhanced deep unrolling networks for snapshot compressive hyperspectral imaging

Xinran Qin, Yuhui Quan, and Hui Ji. Enhanced deep unrolling networks for snapshot compressive hyperspectral imaging. Neural Networks, 174:106250, 2024. 3

work page 2024
[19]

Siamese cooperative learning for unsupervised image reconstruction from incomplete measurements.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 46(7):4866–4879,

Yuhui Quan, Xinran Qin, Tongyao Pang, and Hui Ji. Siamese cooperative learning for unsupervised image reconstruction from incomplete measurements.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 46(7):4866–4879,

work page
[20]

Vari- ational pansharpening by exploiting cartoon-texture similari- ties.IEEE Transactions on Geoscience and Remote Sensing, pages 1–16, 2021

Xin Tian, Yuerong Chen, Changcai Yang, and Jiayi Ma. Vari- ational pansharpening by exploiting cartoon-texture similari- ties.IEEE Transactions on Geoscience and Remote Sensing, pages 1–16, 2021. 3

work page 2021
[21]

Vp-net: An interpretable deep network for variational pansharpening

Xin Tian, Kun Li, Zhongyuan Wang, and Jiayi Ma. Vp-net: An interpretable deep network for variational pansharpening. IEEE Transactions on Geoscience and Remote Sensing, pages 1–16, 2021. 3

work page 2021
[22]

Omnidirectional image super-resolution via bi-projection fusion

Jiangang Wang, Yuning Cui, Yawen Li, Wenqi Ren, and Xiaochun Cao. Omnidirectional image super-resolution via bi-projection fusion. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5454–5462, 2024. 3

work page 2024
[23]

Rap-sr: Restoration prior enhance- ment in diffusion models for realistic image super-resolution

Jiangang Wang, Qingnan Fan, Jinwei Chen, Hong Gu, Feng Huang, and Wenqi Ren. Rap-sr: Restoration prior enhance- ment in diffusion models for realistic image super-resolution. InProceedings of the AAAI Conference on Artificial Intelli- gence, 2025. 3

work page 2025
[24]

V o+net: An adaptive approach using variational optimization and deep learning for panchro- matic sharpening.IEEE Transactions on Geoscience and Remote Sensing, pages 1–16, 2021

Zhong-Cheng Wu, Ting-Zhu Huang, Liang-Jian Deng, Jin- Fan Hu, and Gemine Vivone. V o+net: An adaptive approach using variational optimization and deep learning for panchro- matic sharpening.IEEE Transactions on Geoscience and Remote Sensing, pages 1–16, 2021. 3

work page 2021
[25]

Mhf-net: An interpretable deep network for multispec- tral and hyperspectral image fusion.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3):1457–1473,

Qi Xie, Minghao Zhou, Qian Zhao, Zongben Xu, and Deyu Meng. Mhf-net: An interpretable deep network for multispec- tral and hyperspectral image fusion.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3):1457–1473,

work page
[26]

Deep gradient projection networks for pan-sharpening

Shuang Xu, Jiangshe Zhang, Zixiang Zhao, Kai Sun, Junmin Liu, and Chunxia Zhang. Deep gradient projection networks for pan-sharpening. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 1366–1375, 2021. 3, 7

work page 2021
[27]

Panflownet: A flow- based deep network for pan-sharpening

Gang Yang, Xiangyong Cao, Wenzhe Xiao, Man Zhou, Aip- ing Liu, Xun Chen, and Deyu Meng. Panflownet: A flow- based deep network for pan-sharpening. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 16857–16867, 2023. 7

work page 2023
[28]

Pannet: A deep network architecture for pan-sharpening

Junfeng Yang, Xueyang Fu, Yuwen Hu, Yue Huang, Xinghao Ding, and John Paisley. Pannet: A deep network architecture for pan-sharpening. InProceedings of the IEEE international conference on computer vision, pages 5449–5457, 2017. 3, 7

work page 2017
[29]

Q. Yuan, Y . Wei, X. Meng, H. Shen, and L. Zhang. A multi- scale and multidepth convolutional neural network for remote sensing imagery pan-sharpening.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(3):978–989, 2018. 3, 7

work page 2018
[30]

Spatial-frequency domain information integration for pan-sharpening

Man Zhou, Jie Huang, Keyu Yan, Hu Yu, Xueyang Fu, Aiping Liu, Xian Wei, and Feng Zhao. Spatial-frequency domain information integration for pan-sharpening. InComputer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVIII, pages 274–291. Springer, 2022. 7

work page 2022
[31]

Mutual information-driven pan-sharpening

Man Zhou, Keyu Yan, Jie Huang, Zihe Yang, Xueyang Fu, and Feng Zhao. Mutual information-driven pan-sharpening. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1798–1808,

work page

[1] [1]

Im- proving component substitution pansharpening through mul- tivariate regression of ms + pan data.IEEE Transactions on Geoscience and Remote Sensing, 45(10):3230–3239, 2007

Bruno Aiazzi, Stefano Baronti, and Massimo Selva. Im- proving component substitution pansharpening through mul- tivariate regression of ms + pan data.IEEE Transactions on Geoscience and Remote Sensing, 45(10):3230–3239, 2007. 3

work page 2007

[2] [2]

Super-resolution-guided progres- sive pansharpening based on a deep convolutional neural net- work.IEEE Transactions on Geoscience and Remote Sensing, 59(6):5206–5220, 2020

Jiajun Cai and Bo Huang. Super-resolution-guided progres- sive pansharpening based on a deep convolutional neural net- work.IEEE Transactions on Geoscience and Remote Sensing, 59(6):5206–5220, 2020. 3, 7

work page 2020

[3] [3]

Wjoseph Carper, Thomasm Lillesand, and Ralphw Kiefer. The use of intensity-hue-saturation transformations for merg- ing spot panchromatic and multispectral image data.Pho- togrammetric Engineering and remote sensing, 56(4):459– 467, 1990. 3

work page 1990

[4] [4]

Sirf: Simultaneous satellite image registration and fusion in a uni- fied framework.IEEE Transactions on Image Processing, 24 (11):4213–4224, 2015

Chen Chen, Yeqing Li, Wei Liu, and Junzhou Huang. Sirf: Simultaneous satellite image registration and fusion in a uni- fied framework.IEEE Transactions on Image Processing, 24 (11):4213–4224, 2015. 3

work page 2015

[5] [5]

Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures, 2025

Yuchen Duan, Weiyun Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Hongsheng Li, Jifeng Dai, and Wenhai Wang. Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures, 2025. 4

work page 2025

[6] [6]

A deep convolutional encoder-decoder-restorer architecture for image deblurring.Neural Processing Letters, 56(1):27, 2024

Yiqing Fan, Chaoqun Hong, Guanghui Zeng, and Lijuan Liu. A deep convolutional encoder-decoder-restorer architecture for image deblurring.Neural Processing Letters, 56(1):27, 2024

work page 2024

[7] [7]

A variational pan-sharpening with local gradient constraints

Xueyang Fu, Zihuang Lin, Yue Huang, and Xinghao Ding. A variational pan-sharpening with local gradient constraints. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10265–10274, 2019. 3

work page 2019

[8] [8]

Nonlinear ihs: A promising method for pan-sharpening.IEEE Geoscience and Remote Sensing Letters, 13(11):1606–1610, 2016

Morteza Ghahremani and Hassan Ghassemian. Nonlinear ihs: A promising method for pan-sharpening.IEEE Geoscience and Remote Sensing Letters, 13(11):1606–1610, 2016. 3

work page 2016

[9] [9]

A. R. Gillespie, A. B. Kahle, and R. E. Walker. Color en- hancement of highly correlated images. ii. channel ratio and ”chromaticity” transformation techniques - sciencedirect.Re- mote Sensing of Environment, 22(3):343–365, 1987. 7

work page 1987

[10] [10]

Color enhancement of highly correlated images

Alan R Gillespie, Anne B Kahle, and Richard E Walker. Color enhancement of highly correlated images. ii. channel ratio and ”chromaticity” transformation techniques.Remote Sensing of Environment, 22(3):343–365, 1987. 3

work page 1987

[11] [11]

Haydn, G

R. Haydn, G. W. Dalke, J. Henkel, and J. E. Bare. Application of the ihs color transform to the processing of multisensor data and image enhancement.National Academy of Sciences of the United States of America, 79(13):571–577, 1982. 7

work page 1982

[12] [12]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 3

work page 2016

[13] [13]

Process for enhancing the spatial resolution of multispectral imagery using pan- sharpening, 2000

Craig A Laben and Bernard V Brower. Process for enhancing the spatial resolution of multispectral imagery using pan- sharpening, 2000. US Patent 6,011,875. 7

work page 2000

[14] [14]

Image reconstruction of compressed sensing mri using graph-based redundant wavelet transform.Medical Image Analysis, 27:93–104, 2016

Zongying Lai, Xiaobo Qu, Yunsong Liu, Di Guo, Jing Ye, Zhifang Zhan, and Zhong Chen. Image reconstruction of compressed sensing mri using graph-based redundant wavelet transform.Medical Image Analysis, 27:93–104, 2016

work page 2016

[15] [15]

Two-stage fusion of thermal hyperspectral and visible rgb image by pca and guided filter

Wenzhi Liao, Xin Huang, Frieke Van Coillie, Guy Thoonen, Aleksandra Piˇzurica, Paul Scheunders, and Wilfried Philips. Two-stage fusion of thermal hyperspectral and visible rgb image by pca and guided filter. In2015 7th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), pages 1–4. Ieee, 2015. 7

work page 2015

[16] [16]

J. G. Liu. Smoothing filter-based intensity modulation: A spectral preserve image fusion technique for improving spatial details.International Journal of Remote Sensing, 21(18): 3461–3472, 2000. 7

work page 2000

[17] [17]

Pansharpening by convolutional neural networks.Remote Sensing, 8(7):594, 2016

Giuseppe Masi, Davide Cozzolino, Luisa Verdoliva, and Giuseppe Scarpa. Pansharpening by convolutional neural networks.Remote Sensing, 8(7):594, 2016. 3, 7

work page 2016

[18] [18]

Enhanced deep unrolling networks for snapshot compressive hyperspectral imaging

Xinran Qin, Yuhui Quan, and Hui Ji. Enhanced deep unrolling networks for snapshot compressive hyperspectral imaging. Neural Networks, 174:106250, 2024. 3

work page 2024

[19] [19]

Siamese cooperative learning for unsupervised image reconstruction from incomplete measurements.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 46(7):4866–4879,

Yuhui Quan, Xinran Qin, Tongyao Pang, and Hui Ji. Siamese cooperative learning for unsupervised image reconstruction from incomplete measurements.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 46(7):4866–4879,

work page

[20] [20]

Vari- ational pansharpening by exploiting cartoon-texture similari- ties.IEEE Transactions on Geoscience and Remote Sensing, pages 1–16, 2021

Xin Tian, Yuerong Chen, Changcai Yang, and Jiayi Ma. Vari- ational pansharpening by exploiting cartoon-texture similari- ties.IEEE Transactions on Geoscience and Remote Sensing, pages 1–16, 2021. 3

work page 2021

[21] [21]

Vp-net: An interpretable deep network for variational pansharpening

Xin Tian, Kun Li, Zhongyuan Wang, and Jiayi Ma. Vp-net: An interpretable deep network for variational pansharpening. IEEE Transactions on Geoscience and Remote Sensing, pages 1–16, 2021. 3

work page 2021

[22] [22]

Omnidirectional image super-resolution via bi-projection fusion

Jiangang Wang, Yuning Cui, Yawen Li, Wenqi Ren, and Xiaochun Cao. Omnidirectional image super-resolution via bi-projection fusion. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5454–5462, 2024. 3

work page 2024

[23] [23]

Rap-sr: Restoration prior enhance- ment in diffusion models for realistic image super-resolution

Jiangang Wang, Qingnan Fan, Jinwei Chen, Hong Gu, Feng Huang, and Wenqi Ren. Rap-sr: Restoration prior enhance- ment in diffusion models for realistic image super-resolution. InProceedings of the AAAI Conference on Artificial Intelli- gence, 2025. 3

work page 2025

[24] [24]

V o+net: An adaptive approach using variational optimization and deep learning for panchro- matic sharpening.IEEE Transactions on Geoscience and Remote Sensing, pages 1–16, 2021

Zhong-Cheng Wu, Ting-Zhu Huang, Liang-Jian Deng, Jin- Fan Hu, and Gemine Vivone. V o+net: An adaptive approach using variational optimization and deep learning for panchro- matic sharpening.IEEE Transactions on Geoscience and Remote Sensing, pages 1–16, 2021. 3

work page 2021

[25] [25]

Mhf-net: An interpretable deep network for multispec- tral and hyperspectral image fusion.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3):1457–1473,

Qi Xie, Minghao Zhou, Qian Zhao, Zongben Xu, and Deyu Meng. Mhf-net: An interpretable deep network for multispec- tral and hyperspectral image fusion.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3):1457–1473,

work page

[26] [26]

Deep gradient projection networks for pan-sharpening

Shuang Xu, Jiangshe Zhang, Zixiang Zhao, Kai Sun, Junmin Liu, and Chunxia Zhang. Deep gradient projection networks for pan-sharpening. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 1366–1375, 2021. 3, 7

work page 2021

[27] [27]

Panflownet: A flow- based deep network for pan-sharpening

Gang Yang, Xiangyong Cao, Wenzhe Xiao, Man Zhou, Aip- ing Liu, Xun Chen, and Deyu Meng. Panflownet: A flow- based deep network for pan-sharpening. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 16857–16867, 2023. 7

work page 2023

[28] [28]

Pannet: A deep network architecture for pan-sharpening

Junfeng Yang, Xueyang Fu, Yuwen Hu, Yue Huang, Xinghao Ding, and John Paisley. Pannet: A deep network architecture for pan-sharpening. InProceedings of the IEEE international conference on computer vision, pages 5449–5457, 2017. 3, 7

work page 2017

[29] [29]

Q. Yuan, Y . Wei, X. Meng, H. Shen, and L. Zhang. A multi- scale and multidepth convolutional neural network for remote sensing imagery pan-sharpening.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(3):978–989, 2018. 3, 7

work page 2018

[30] [30]

Spatial-frequency domain information integration for pan-sharpening

Man Zhou, Jie Huang, Keyu Yan, Hu Yu, Xueyang Fu, Aiping Liu, Xian Wei, and Feng Zhao. Spatial-frequency domain information integration for pan-sharpening. InComputer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVIII, pages 274–291. Springer, 2022. 7

work page 2022

[31] [31]

Mutual information-driven pan-sharpening

Man Zhou, Keyu Yan, Jie Huang, Zihe Yang, Xueyang Fu, and Feng Zhao. Mutual information-driven pan-sharpening. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1798–1808,

work page