A Unified Foundation Model for All-in-One Multi-Modal Remote Sensing Image Restoration and Fusion with Language Prompting

Peng Liu; Yongchuan Cui

arxiv: 2604.05629 · v1 · submitted 2026-04-07 · 💻 cs.CV

A Unified Foundation Model for All-in-One Multi-Modal Remote Sensing Image Restoration and Fusion with Language Prompting

Yongchuan Cui , Peng Liu This is my paper

Pith reviewed 2026-05-10 18:29 UTC · model grok-4.3

classification 💻 cs.CV

keywords remote sensingimage restorationfoundation modelmixture of expertslanguage promptingmulti-task learningoptimal transportimage fusion

0 comments

The pith

LLaRS provides a single foundation model for handling eleven remote sensing restoration and fusion tasks using language prompts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Remote sensing images are degraded by clouds, haze, noise, and other issues, typically requiring separate models for each type of fix. This paper presents LLaRS as a unified model that processes multiple modalities and tasks in one framework by aligning image bands semantically and routing features through specialized expert networks guided by text prompts. The approach is enabled by a large new dataset of a million examples covering real and synthetic degradations. Experiments indicate it beats dedicated models and adapts efficiently to new scenarios with limited additional training. This matters because it could streamline the processing of vast satellite imagery archives without maintaining many different tools.

Core claim

LLaRS is presented as the first unified foundation model for multi-modal and multi-task remote sensing low-level vision. It aligns heterogeneous bands using Sinkhorn-Knopp optimal transport, routes features via three complementary mixture-of-experts layers for spatial patterns, spectral fidelity, and global context with low-rank adapters, and stabilizes training with step-level dynamic weight adjustment. Trained on the LLaRS1M dataset with eleven tasks and language prompts, it consistently outperforms seven competitive models and shows strong transfer capability through parameter-efficient finetuning on unseen data.

What carries the argument

The LLaRS architecture, which uses Sinkhorn-Knopp optimal transport for band alignment combined with three complementary mixture-of-experts layers and dynamic weighting for joint multi-task optimization.

If this is right

LLaRS can replace multiple task-specific models for remote sensing image restoration and fusion.
It achieves better performance than seven existing competitive models across the tasks.
Parameter-efficient finetuning enables effective adaptation to new data and unseen tasks.
The use of language prompts allows flexible control over the restoration process.
Joint training on the LLaRS1M dataset supports consistent performance without major trade-offs between tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Operational remote sensing systems could integrate this model to reduce the complexity of handling diverse degradation types in a single pipeline.
Natural language interfaces might enable users without deep technical expertise to request specific image enhancements directly.
The band alignment technique could be tested for applicability in other multi-spectral domains such as hyperspectral medical imaging.
Further scaling of the model size or dataset might lead to even broader generalization across sensors and conditions.

Load-bearing premise

The combination of Sinkhorn-Knopp band alignment, three complementary MoE layers, and step-level dynamic weighting can jointly optimize across eleven heterogeneous restoration tasks without requiring separate models due to performance trade-offs.

What would settle it

If separate models trained individually for each of the eleven tasks outperform LLaRS on a standard benchmark test set, or if LLaRS shows degraded performance on some tasks compared to specialized approaches, the unified model's advantage would be disproven.

Figures

Figures reproduced from arXiv: 2604.05629 by Peng Liu, Yongchuan Cui.

**Figure 2.** Figure 2: Overall architecture of LLaRS. models often struggle with input misalignment across different spectral channel configurations and spatial sampling intervals, leading to semantic ambiguity. Developing unified architectures that inherently align heterogeneous multimodal inputs and designing novel training paradigms specifically for pixel-level dense prediction remain unresolved challenges in this domain.… view at source ↗

**Figure 3.** Figure 3: Entropy-regularized channel-to-slot matching. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Geographic distribution of LLaRS1M sampling locations. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: (left) lists per-task sample totals and prompt counts. The magnitudes follow how each corpus is built: large cropped synthetic dehazing sets and multi-site superresolution archives contribute high counts, whereas paired 180° 180° 120°W 120°W 60°W 60°W 0° 0° 60°E 60°E 120°E 120°E 180° 180° 60°S 60°S 30°S 30°S 0° 0° 30°N 30°N 60°N 60°N [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Word cloud of all prompts in LLaRS1M. Landsat–MODIS series for spatiotemporal fusion are comparatively few; six simulation pipelines each draw a fixed budget from shared clean references. Prompt pool sizes track how richly a task can be verbalized, for instance, cloud removal spans thin versus thick clouds and auxiliary cues, whereas dehaze wording stays closer to a shared lexical core. (Right) We encode… view at source ↗

**Figure 7.** Figure 7: Model predictions and error maps for denoising. [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 8.** Figure 8: Model predictions and error maps for destriping. [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗

**Figure 11.** Figure 11: Relationship between trainable parameter ratio and aver [PITH_FULL_IMAGE:figures/full_fig_p007_11.png] view at source ↗

**Figure 9.** Figure 9: Model predictions and error maps for spatiotemporal [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗

**Figure 10.** Figure 10: Comparison of t-SNE task feature separability across [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗

**Figure 12.** Figure 12: Evolution of channel-to-slot transport for eleven tasks. ( [PITH_FULL_IMAGE:figures/full_fig_p008_12.png] view at source ↗

**Figure 13.** Figure 13: Evolution of per-task weight changing. the number of experts, multi-task optimization strategies, and model efficiency are provided in Sec. C. 5. Conclusion This work presents LLaRS, a multi-task foundation model for remote sensing low-level vision. We built LLaRS1M, a large-scale dataset with real pairs and synthetic degradations across eleven restoration tasks, paired with diverse language prompts. Expe… view at source ↗

**Figure 14.** Figure 14: LLaRS1M examples. 32 34 36 38 40 PSNR 37.60 35.76 32.52 0.8 0.9 1.0 SSIM 0.9172 0.9046 0.7562 0.05 0.10 0.15 SAM 0.0644 0.0726 0.1463 5 10 15 20 ERGAS 4.41 4.66 17.08 LLaRS w/o channel align w/o text prompt [PITH_FULL_IMAGE:figures/full_fig_p015_14.png] view at source ↗

**Figure 15.** Figure 15: Contribution analysis of text prompt and OT-based chan [PITH_FULL_IMAGE:figures/full_fig_p015_15.png] view at source ↗

**Figure 16.** Figure 16: Model predictions and error maps for deblurring. [PITH_FULL_IMAGE:figures/full_fig_p016_16.png] view at source ↗

**Figure 19.** Figure 19: Model predictions and error maps for histogram equal [PITH_FULL_IMAGE:figures/full_fig_p017_19.png] view at source ↗

**Figure 20.** Figure 20: Model predictions and error maps for brightness en [PITH_FULL_IMAGE:figures/full_fig_p017_20.png] view at source ↗

**Figure 21.** Figure 21: Fine-tuning qualitative comparison for dehazing. [PITH_FULL_IMAGE:figures/full_fig_p018_21.png] view at source ↗

**Figure 22.** Figure 22: Fine-tuning qualitative comparison for super-resolution. [PITH_FULL_IMAGE:figures/full_fig_p019_22.png] view at source ↗

**Figure 23.** Figure 23: Fine-tuning qualitative comparison for SAR despeckling. [PITH_FULL_IMAGE:figures/full_fig_p020_23.png] view at source ↗

read the original abstract

Remote sensing imagery suffers from clouds, haze, noise, resolution limits, and sensor heterogeneity. Existing restoration and fusion approaches train separate models per degradation type. In this work, we present Language-conditioned Large-scale Remote Sensing restoration model (LLaRS), the first unified foundation model for multi-modal and multi-task remote sensing low-level vision. LLaRS employs Sinkhorn-Knopp optimal transport to align heterogeneous bands into semantically matched slots, routes features through three complementary mixture-of-experts layers (convolutional experts for spatial patterns, channel-mixing experts for spectral fidelity, and attention experts with low-rank adapters for global context), and stabilizes joint training via step-level dynamic weight adjustment. To train LLaRS, we construct LLaRS1M, a million-scale multi-task dataset spanning eleven restoration and enhancement tasks, integrating real paired observations and controlled synthetic degradations with diverse natural language prompts. Experiments show LLaRS consistently outperforms seven competitive models, and parameter-efficient finetuning experiments demonstrate strong transfer capability and adaptation efficiency on unseen data. Repo: https://github.com/yc-cui/LLaRS

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LLaRS puts together a usable single model for multiple remote sensing restoration tasks with a new large dataset, but the gains need checking against the actual numbers and ablations.

read the letter

The core contribution is a single model called LLaRS that tackles eleven different low-level tasks in remote sensing—dehazing, denoising, super-resolution, fusion, and so on—using language prompts to steer it. They built LLaRS1M, a million-scale dataset mixing real paired data with synthetic degradations, and trained the model end-to-end instead of the usual separate networks per problem. That unification is the practical point for anyone dealing with varied satellite imagery. The architecture combines Sinkhorn-Knopp alignment for mismatched bands, three MoE branches (convolutional for spatial detail, channel-mixing for spectra, and low-rank attention for context), plus step-wise dynamic weighting to keep the tasks from fighting each other during training. Those pieces are not brand new individually, but the combination for this domain and the language conditioning are what they add. The reported outperformance over seven baselines and the parameter-efficient fine-tuning results on unseen data suggest the setup works at least on their test splits. The dataset itself looks like something others could build on if the construction details hold up. The soft spots are the usual ones for this kind of empirical work. Joint training across heterogeneous tasks can still produce uneven results even with dynamic weights, so the paper needs clear per-task metrics and ablations showing no single expert dominates or that removing any component hurts. How closely the synthetic degradations match real sensor artifacts is always worth scrutiny in remote sensing. The repo is mentioned, which helps, but independent verification of the transfer claims would strengthen it. This paper is for researchers and practitioners in remote sensing image processing who want fewer models in their pipelines. A reader focused on low-level vision for earth observation would find the dataset and the all-in-one framing useful even if they adapt the architecture. It deserves peer review because the problem is real, the approach is concrete, and the claims are falsifiable with the provided code and data.

Referee Report

2 major / 3 minor

Summary. The paper introduces LLaRS, the first unified foundation model for multi-modal and multi-task remote sensing low-level vision tasks including restoration and fusion. It employs Sinkhorn-Knopp optimal transport for aligning heterogeneous bands, routes features through three complementary mixture-of-experts layers (convolutional for spatial patterns, channel-mixing for spectral fidelity, and attention with low-rank adapters for global context), and uses step-level dynamic weight adjustment for stable joint training. A new million-scale dataset LLaRS1M is constructed covering eleven tasks with real and synthetic degradations plus language prompts. Experiments claim consistent outperformance over seven competitive models and strong transfer via parameter-efficient finetuning on unseen data.

Significance. If the empirical results hold, the work is significant for establishing a single model capable of handling eleven heterogeneous remote sensing restoration and fusion tasks without task-specific retraining, supported by a large-scale multi-task dataset and an architecture designed for joint optimization. This could reduce the proliferation of separate models in the field and enable more efficient adaptation through language prompting and PEFT, advancing foundation-model approaches in remote sensing low-level vision.

major comments (2)

[§4] §4 (Experiments) and associated tables: the central claim of consistent outperformance and absence of task-specific trade-offs relies on quantitative comparisons across all eleven tasks, but the reported results must include per-task metrics, ablation on the three MoE branches plus dynamic weighting, and direct comparison to task-specific baselines trained on the same LLaRS1M data to confirm no negative transfer occurs.
[§3.2] §3.2 (Architecture): the step-level dynamic weight adjustment is presented as stabilizing joint training, but the paper should provide the exact formulation of the weighting parameters and demonstrate via ablation that they are not merely fitting to the training distribution in a way that reduces the claimed generality.

minor comments (3)

[Figure 1] Figure 1 and §3: the diagram of the three MoE layers and Sinkhorn-Knopp alignment would benefit from clearer annotation of input/output dimensions and how language prompts are injected at each stage.
[§5] §5 (Transfer experiments): the parameter-efficient finetuning results on unseen data should report the number of trainable parameters and adaptation steps for transparency.
[References] References: several recent works on multi-task remote sensing restoration and MoE in vision are missing; add citations to ensure the positioning against prior unified models is complete.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for minor revision. The points raised strengthen the empirical support for our claims of unified multi-task performance and the role of the dynamic weighting mechanism. We address each major comment below.

read point-by-point responses

Referee: [§4] §4 (Experiments) and associated tables: the central claim of consistent outperformance and absence of task-specific trade-offs relies on quantitative comparisons across all eleven tasks, but the reported results must include per-task metrics, ablation on the three MoE branches plus dynamic weighting, and direct comparison to task-specific baselines trained on the same LLaRS1M data to confirm no negative transfer occurs.

Authors: We agree that per-task metrics and targeted ablations are necessary to fully substantiate the absence of task-specific trade-offs. The submitted manuscript reported aggregated metrics to highlight overall trends; in the revision we will add complete per-task tables for all eleven tasks. We will also include ablations isolating each of the three MoE branches (convolutional, channel-mixing, and attention with low-rank adapters) and the dynamic weighting component. In addition, we will train task-specific baselines on the identical LLaRS1M data and report direct comparisons, thereby confirming that joint training yields no negative transfer relative to specialized models. revision: yes
Referee: [§3.2] §3.2 (Architecture): the step-level dynamic weight adjustment is presented as stabilizing joint training, but the paper should provide the exact formulation of the weighting parameters and demonstrate via ablation that they are not merely fitting to the training distribution in a way that reduces the claimed generality.

Authors: We will insert the exact mathematical formulation of the step-level dynamic weight adjustment, including the update rules for the weighting parameters, into §3.2. To address the concern about potential overfitting, we will add an ablation that trains the model both with and without dynamic weighting. Performance will be reported on held-out validation splits of LLaRS1M as well as on completely unseen tasks and data distributions. These results will show that the mechanism improves training stability while maintaining or improving generalization, rather than trading generality for in-distribution fit. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an empirical architecture for a unified remote sensing restoration model using standard components (Sinkhorn-Knopp alignment, mixture-of-experts layers, dynamic weighting) trained on a newly constructed million-scale dataset LLaRS1M. No equations, derivations, or self-referential definitions are provided that reduce claimed performance or unification to fitted parameters or prior self-citations by construction. Central claims rest on experimental outperformance and transfer results rather than internal circular logic.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The central claim rests on the empirical effectiveness of the described architecture and dataset. No explicit free parameters, axioms, or invented entities are stated in the abstract, but the dynamic weight adjustment and expert routing implicitly introduce tunable components whose values are learned from data.

free parameters (1)

step-level dynamic weight adjustment parameters
Used to stabilize joint training across tasks; values are learned or scheduled during optimization.

pith-pipeline@v0.9.0 · 5497 in / 1351 out tokens · 39816 ms · 2026-05-10T18:29:26.799763+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

117 extracted references · 117 canonical work pages

[1]

SatlasPretrain: A large-scale dataset for remote sensing image understanding

Favyen Bastani, Piper Wolters, Ritwik Gupta, Joe Ferdinando, and Aniruddha Kembhavi. SatlasPretrain: A large-scale dataset for remote sensing image understanding. InICCV, pages 16726–16736, 2023. 2

work page 2023
[2]

BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models

Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. InACL, pages 1–9, Dublin, Ireland,

work page
[3]

Association for Computational Linguistics. 7

work page
[4]

Unsupervised learn- ing of visual features by contrasting cluster assignments

Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. Unsupervised learn- ing of visual features by contrasting cluster assignments. In NeurIPS, pages 9912–9924, 2020. 2

work page 2020
[5]

Pre-trained image processing transformer

Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, and Wen Gao. Pre-trained image processing transformer. In CVPR, pages 12294–12305, 2021. 2

work page 2021
[6]

Dynamic convolution: Attention over convolution kernels

Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Lu Yuan, and Zicheng Liu. Dynamic convolution: Attention over convolution kernels. InCVPR, pages 11030–11039,

work page
[7]

GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks

Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InICML, pages 794–803. PMLR, 2018. 15

work page 2018
[8]

Trinity-Net: Gradient- guided swin transformer-based remote sensing image dehaz- ing and beyond.IEEE Trans

Kaichen Chi, Yuan Yuan, and Qi Wang. Trinity-Net: Gradient- guided swin transformer-based remote sensing image dehaz- ing and beyond.IEEE Trans. Geosci. Remote Sens., 61:1–14,

work page
[9]

Conde, Gregor Geigle, and Radu Timofte

Marcos V . Conde, Gregor Geigle, and Radu Timofte. In- structIR: High-quality image restoration following human instructions. InECCV, page 1–21, Berlin, Heidelberg, 2024. Springer-Verlag. 1, 2

work page 2024
[10]

Lobell, and Stefano Ermon

Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi, Yutong He, Marshall Burke, David B. Lobell, and Stefano Ermon. SatMAE: Pre-training transformers for tem- poral and multi-spectral satellite imagery. InNeurIPS, Red Hook, NY , USA, 2022. Curran Associates Inc. 2

work page 2022
[11]

Enpowering your pansharpening models with generalizability: Unified distri- bution is all you need

Yongchuan Cui, Peng Liu, and Hui Zhang. Enpowering your pansharpening models with generalizability: Unified distri- bution is all you need. InICCV, pages 11850–11860, 2025. 1

work page 2025
[12]

Sinkhorn distances: Lightspeed computation of optimal transport

Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. InNeurIPS, pages 2292–2300, 2013. 2, 3, 4, 8, 14

work page 2013
[13]

TerraFM: A scalable foundation model for unified multisensor earth observation.arXiv, 2025

Muhammad Sohail Danish, Muhammad Akhtar Munir, Syed Roshaan Ali Shah, Muhammad Haris Khan, Rao Muhammad Anwer, Jorma Laaksonen, Fahad Shahbaz Khan, and Salman Khan. TerraFM: A scalable foundation model for unified multisensor earth observation.arXiv, 2025. 2

work page 2025
[14]

ImageNet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database . InCVPR, pages 248–255, Los Alamitos, CA, USA, 2009. IEEE Computer Society. 2

work page 2009
[15]

Machine learning in pansharpening: A benchmark, from shallow to deep networks.IEEE Geosci

Liang-Jian Deng, Gemine Vivone, Mercedes E Paoletti, Giuseppe Scarpa, Jiang He, Yongjun Zhang, Jocelyn Chanus- sot, and Antonio Plaza. Machine learning in pansharpening: A benchmark, from shallow to deep networks.IEEE Geosci. Remote Sens. Mag., 10(3):279–315, 2022. 12, 13, 14

work page 2022
[16]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InICLR, 2021. 2

work page 2021
[17]

Multisensor Data Fusion for Cloud Removal in Global and All-Season Sentinel-2 Imagery.IEEE Trans

Patrick Ebel, Andrea Meraner, Michael Schmitt, and Xiao Xi- ang Zhu. Multisensor Data Fusion for Cloud Removal in Global and All-Season Sentinel-2 Imagery.IEEE Trans. Geosci. Remote Sens., 59(7):5866–5878, 2021. 12, 13, 14

work page 2021
[18]

Emelyanova, Tim R

Irina V . Emelyanova, Tim R. McVicar, Thomas G. Van Niel, Ling Tao Li, and Albert I.J.M. van Dijk. Assessing the accu- racy of blending landsat–modis surface reflectances in two landscapes with contrasting spatial and temporal dynamics: A framework for algorithm selection.Remote Sens. Environ., 133:193–209, 2013. 12, 13, 14

work page 2013
[19]

Ro- bust SAR image despeckling by deep learning from near-real datasets.IEEE J

Jianjun Guan, Ping Zhong, Fan Zhang, and Yuhan Liu. Ro- bust SAR image despeckling by deep learning from near-real datasets.IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 17:3475–3487, 2024. 12, 13, 14

work page 2024
[20]

SkySense: A multi-modal remote sensing foundation model towards universal inter- pretation for earth observation imagery

Xin Guo, Jiangwei Lao, Bo Dang, Yingying Zhang, Lei Yu, Lixiang Ru, Liheng Zhong, Ziyuan Huang, Kang Wu, Dingx- iang Hu, Huimei He, Jian Wang, Jingdong Chen, Ming Yang, Yongjun Zhang, and Yansheng Li. SkySense: A multi-modal remote sensing foundation model towards universal inter- pretation for earth observation imagery. InCVPR, pages 27662–27673, 2024. 2

work page 2024
[21]

Wasserstein wormhole: Scalable optimal transport distance with transformer

Doron Haviv, Russell Zhang Kunes, Thomas Dougherty, Cas- sandra Burdziak, Tal Nawy, Anna Gilbert, and Dana Pe’er. Wasserstein wormhole: Scalable optimal transport distance with transformer. InICML, pages 17697–17718. PMLR, 2024. 2

work page 2024
[22]

Diffusion models in low-level vision: A survey, 2024

Chunming He, Yuqi Shen, Chengyu Fang, Fengyang Xiao, Longxiang Tang, Yulun Zhang, Wangmeng Zuo, Zhenhua Guo, and Xiu Li. Diffusion models in low-level vision: A survey, 2024. 1

work page 2024
[23]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InCVPR, pages 770–778, 2016. 13

work page 2016
[24]

Parameter-efficient transfer learning for NLP

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for NLP. InICML, pages 2790–2799. PMLR, 2019. 7

work page 2019
[25]

LoRA: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InICLR,

work page
[26]

Single satellite optical imagery dehazing using sar image prior based on conditional generative adversarial net- works

Binghui Huang, Li Zhi, Chao Yang, Fuchun Sun, and Yixu Song. Single satellite optical imagery dehazing using sar image prior based on conditional generative adversarial net- works. InWACV, pages 1806–1813, 2020. 12, 13, 14

work page 2020
[27]

Transformer fusion with optimal transport

Moritz Imfeld, Jacopo Graldi, Marco Giordano, Thomas Hof- mann, Sotiris Anagnostidis, and Sidak Pal Singh. Transformer fusion with optimal transport. InICLR, 2024. 2 9

work page 2024
[28]

Optimal transport aggre- gation for visual place recognition

Sergio Izquierdo and Javier Civera. Optimal transport aggre- gation for visual place recognition. InCVPR, pages 17658– 17668, 2024. 2

work page 2024
[29]

Jacobs, Michael I

Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive mixtures of local experts.Neu- ral Comput., 3(1):79–87, 1991. 3, 5

work page 1991
[30]

All-In-One Image Restoration for Unknown Corruption

Boyun Li, Xiao Liu, Peng Hu, Zhongqin Wu, Jiancheng Lv, and Xi Peng. All-In-One Image Restoration for Unknown Corruption . InCVPR, pages 17431–17441, Los Alamitos, CA, USA, 2022. IEEE Computer Society. 2

work page 2022
[31]

Spatio-temporal fusion for remote sensing data: An overview and new benchmark.Sci

Jun Li, Yunfei Li, Lin He, Jin Chen, and Antonio Plaza. Spatio-temporal fusion for remote sensing data: An overview and new benchmark.Sci. China Inf. Sci., 63(4):140301, 2020. 12, 13, 14

work page 2020
[32]

Tan, and Loong-Fah Cheong

Ruoteng Li, Robby T. Tan, and Loong-Fah Cheong. All in one bad weather removal using architectural search. InCVPR, pages 3172–3182, 2020. 2

work page 2020
[33]

Scaling & shifting your features: a new baseline for efficient model tuning

Dongze Lian, Daquan Zhou, Jiashi Feng, and Xinchao Wang. Scaling & shifting your features: a new baseline for efficient model tuning. InNeurIPS, Red Hook, NY , USA, 2022. Curran Associates Inc. 7

work page 2022
[34]

A remote sensing image dataset for cloud removal, 2019

Daoyu Lin, Guangluan Xu, Xiaoke Wang, Yang Wang, Xian Sun, and Kun Fu. A remote sensing image dataset for cloud removal, 2019. 12, 13, 14

work page 2019
[35]

Conflict-averse gradient descent for multi-task learning

Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. Conflict-averse gradient descent for multi-task learning. InNeurIPS, Red Hook, NY , USA, 2021. Curran Associates Inc. 4, 15

work page 2021
[36]

Dora: weight-decomposed low-rank adaptation

Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. Dora: weight-decomposed low-rank adaptation. InICML. JMLR.org, 2024. 7

work page 2024
[37]

Degae: A new pretraining paradigm for low-level vision

Yihao Liu, Jingwen He, Jinjin Gu, Xiangtao Kong, Yu Qiao, and Chao Dong. Degae: A new pretraining paradigm for low-level vision. InCVPR, pages 23292–23303, 2023. 2

work page 2023
[38]

Ai foundation models in remote sensing: A survey, 2024

Siqi Lu, Junlin Guo, James R Zimmer-Dauphinee, Jordan M Nieusma, Xiao Wang, Parker VanValkenburgh, Steven A Wernke, and Yuankai Huo. Ai foundation models in remote sensing: A survey, 2024. 1

work page 2024
[39]

Gustafsson, Zheng Zhao, Jens Sj¨olund, and Thomas B

Ziwei Luo, Fredrik K. Gustafsson, Zheng Zhao, Jens Sj¨olund, and Thomas B. Sch¨on. Controlling vision-language models for multi-task image restoration. InICLR, 2024. 2

work page 2024
[40]

Visualizing data using t-SNE.J

Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE.J. Mach. Learn. Res., 9(86):2579–2605,

work page
[41]

Ardakani, and Angel D

Armin Mehri, Parichehr B. Ardakani, and Angel D. Sappa. MPRNet: Multi-path residual network for lightweight image super resolution. InWACV, pages 2703–2712, 2021. 6, 13, 16

work page 2021
[42]

A large-scale benchmark data set for evalu- ating pansharpening performance: Overview and implementa- tion.IEEE Geosci

Xiangchao Meng, Yiming Xiong, Feng Shao, Huanfeng Shen, Weiwei Sun, Gang Yang, Qiangqiang Yuan, Randi Fu, and Hongyan Zhang. A large-scale benchmark data set for evalu- ating pansharpening performance: Overview and implementa- tion.IEEE Geosci. Remote Sens. Mag., 9(1):18–52, 2021. 12, 13, 14

work page 2021
[43]

Sen2ven µs, a dataset for the training of sentinel-2 super-resolution algorithms.Data, 7(7):96, 2022

Julien Michel, Juan Vinasco-Salinas, Jordi Inglada, and Olivier Hagolle. Sen2ven µs, a dataset for the training of sentinel-2 super-resolution algorithms.Data, 7(7):96, 2022. 12, 13, 14

work page 2022
[44]

Multi-task learning as a bargaining game

Aviv Navon, Aviv Shamsian, Idan Achituve, Haggai Maron, Kenji Kawaguchi, Gal Chechik, and Ethan Fetaya. Multi-task learning as a bargaining game. InICML, pages 16428–16446. PMLR, 2022. 4, 15

work page 2022
[45]

Learning dual convolutional neural networks for low-level vision

Jinshan Pan, Sifei Liu, Deqing Sun, Jiawei Zhang, Yang Liu, Jimmy Ren, Zechao Li, Jinhui Tang, Huchuan Lu, Yu-Wing Tai, and Ming-Hsuan Yang. Learning dual convolutional neural networks for low-level vision. InCVPR, pages 3070– 3079, 2018. 2

work page 2018
[46]

PromptIR: prompting for all-in-one blind image restoration

Vaishnav Potlapalli, Syed Waqas Zamir, Salman Khan, and Fahad Shahbaz Khan. PromptIR: prompting for all-in-one blind image restoration. InNeurIPS, Red Hook, NY , USA,

work page
[47]

2, 6, 7, 13, 16

Curran Associates Inc. 2, 6, 7, 13, 16

work page
[48]

U-Net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional networks for biomedical image segmentation. InMICCAI, pages 234–241, Cham, 2015. Springer Interna- tional Publishing. 3, 13

work page 2015
[49]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, 2019

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, 2019. 13

work page 2019
[50]

SuperGlue: Learning feature match- ing with graph neural networks

Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. SuperGlue: Learning feature match- ing with graph neural networks. InCVPR, pages 4938–4947,

work page
[51]

Multi-task learning as multi- objective optimization

Ozan Sener and Vladlen Koltun. Multi-task learning as multi- objective optimization. InNeurIPS, page 525–536, Red Hook, NY , USA, 2018. Curran Associates Inc. 15

work page 2018
[52]

Concerning nonnegative matrices and doubly stochastic matrices.Pac

Richard Sinkhorn and Paul Knopp. Concerning nonnegative matrices and doubly stochastic matrices.Pac. J. Math., 21(2): 343–348, 1967. 2, 3, 4, 8, 14

work page 1967
[53]

Diffusion enhancement for cloud removal in ultra-resolution remote sensing imagery.IEEE Trans

Jialu Sui, Yiyang Ma, Wenhan Yang, Xiaokang Zhang, Man- On Pun, and Jiaying Liu. Diffusion enhancement for cloud removal in ultra-resolution remote sensing imagery.IEEE Trans. Geosci. Remote Sens., 62:1–14, 2024. 12, 13, 14

work page 2024
[54]

RingMo: A remote sensing foundation model with masked image modeling.IEEE Trans

Xian Sun, Peijin Wang, Wanxuan Lu, Zicong Zhu, Xiao- nan Lu, Qibin He, Junxi Li, Xuee Rong, Zhujun Yang, Hao Chang, Qinglin He, Guang Yang, Ruiping Wang, Jiwen Lu, and Kun Fu. RingMo: A remote sensing foundation model with masked image modeling.IEEE Trans. Geosci. Remote Sens., 61:1–22, 2023. 2

work page 2023
[55]

Jeya Maria Jose Valanarasu, Rajeev Yasarla, and Vishal M. Patel. Transweather: Transformer-based restoration of images degraded by adverse weather conditions. InCVPR, pages 2353–2363, 2022. 2

work page 2022
[56]

Labeled dataset for training despeckling filters for SAR imagery.Data Brief., 53:110065, 2024

Rub´en Dar´ıo V´asquez-Salazar, Ahmed Alejandro Cardona- Mesa, Luis G´omez, Carlos M Travieso-Gonz´alez, Andr´es F Garavito-Gonz´alez, and Esteban V ´asquez-Cano. Labeled dataset for training despeckling filters for SAR imagery.Data Brief., 53:110065, 2024. 12, 13, 14

work page 2024
[57]

Gomez, Łukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNeurIPS, pages 6000–6010, Red Hook, NY , USA, 2017. Curran Associates Inc. 2

work page 2017
[58]

Multisensor remote sensing 10 imagery super-resolution with conditional gan.J

Junwei Wang, Kun Gao, Zhenzhou Zhang, Chong Ni, Zibo Hu, Dayu Chen, and Qiong Wu. Multisensor remote sensing 10 imagery super-resolution with conditional gan.J. Remote Sens., 2021, 2021. 12, 13, 14

work page 2021
[59]

GridFormer: Residual dense transformer with grid structure for image restoration in adverse weather conditions.IJCV, 132(10):4541–4563, 2024

Tao Wang, Kaihao Zhang, Ziqian Shao, Wenhan Luo, Bjorn Stenger, Tong Lu, Tae-Kyun Kim, Wei Liu, and Hongdong Li. GridFormer: Residual dense transformer with grid structure for image restoration in adverse weather conditions.IJCV, 132(10):4541–4563, 2024. 6, 13, 16

work page 2024
[60]

Gradient as conditions: Rethinking HOG for all-in-one image restoration

Jiawei Wu, Zhifei Yang, Zhe Wang, and Zhi Jin. Gradient as conditions: Rethinking HOG for all-in-one image restoration. AAAI, 40(13):10682–10690, 2026. 6, 13, 16

work page 2026
[61]

mHC: Manifold-constrained hyper-connections.arXiv, 2025

Zhenda Xie, Yixuan Wei, Huanqi Cao, Chenggang Zhao, Chengqi Deng, Jiashi Li, Damai Dai, Huazuo Gao, Jiang Chang, Kuai Yu, Liang Zhao, Shangyan Zhou, Zhean Xu, Zhengyan Zhang, Wangding Zeng, Shengding Hu, Yuqing Wang, Jingyang Yuan, Lean Wang, and Wenfeng Liang. mHC: Manifold-constrained hyper-connections.arXiv, 2025. 2

work page 2025
[62]

Condconv: Conditionally parameterized convolutions for efficient inference

Brandon Yang, Gabriel Bender, Quoc V Le, and Jiquan Ngiam. Condconv: Conditionally parameterized convolutions for efficient inference. InNeurIPS, 2019. 4

work page 2019
[63]

mHC-lite: You Don’t Need 20 Sinkhorn-Knopp Iterations.arXiv, 2026

Yongyi Yang and Jianyang Gao. mHC-lite: You Don’t Need 20 Sinkhorn-Knopp Iterations.arXiv, 2026. 2

work page 2026
[64]

All-In-One Medical Image Restoration via Task-Adaptive Routing

Zhiwen Yang, Haowei Chen, Ziniu Qian, Yang Yi, Hui Zhang, Dan Zhao, Bingzheng Wei, and Yan Xu. All-In-One Medical Image Restoration via Task-Adaptive Routing . InMICCAI. Springer Nature Switzerland, 2024. 6, 7, 13, 16

work page 2024
[65]

Gradient surgery for multi-task learning

Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. Gradient surgery for multi-task learning. InNeurIPS, Red Hook, NY , USA, 2020. Curran Associates Inc. 4, 15

work page 2020
[66]

Z. Yuan, Z. Xiong, L. Mou, and X. X. Zhu. Chatearthnet: a global-scale image–text dataset empowering vision–language geo-foundation models.Earth Syst. Sci. Data, 17(3):1245– 1263, 2025. 2

work page 2025
[67]

Com- plexity experts are task-discriminative learners for any image restoration

Eduard Zamfir, Zongwei Wu, Nancy Mehta, Yuedong Tan, Danda Pani Paudel, Yulun Zhang, and Radu Timofte. Com- plexity experts are task-discriminative learners for any image restoration. InCVPR, pages 12753–12763, 2025. 6, 7, 13, 16

work page 2025
[68]

Restormer: Efficient transformer for high-resolution image restoration

Syed Waqas Zamir, Aditya Arora, Salman Khan, Mu- nawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. InCVPR, pages 5728–5739, 2022. 6, 7, 13, 16

work page 2022
[69]

Dense haze removal based on dynamic collaborative inference learning for remote sensing images.IEEE Trans

Libao Zhang and Shan Wang. Dense haze removal based on dynamic collaborative inference learning for remote sensing images.IEEE Trans. Geosci. Remote Sens., 60:1–16, 2022. 12, 13, 14

work page 2022
[70]

RS5M and GeoRSCLIP: A large scale vision-language dataset and a large vision-language model for remote sensing.IEEE Trans

Zilun Zhang, Tiancheng Zhao, Yulong Guo, and Jianwei Yin. RS5M and GeoRSCLIP: A large scale vision-language dataset and a large vision-language model for remote sensing.IEEE Trans. Geosci. Remote Sens., 62:1–23, 2024. 2

work page 2024
[71]

Towards vision-language geo- foundation model: A survey, 2024

Yue Zhou, Litong Feng, Yiping Ke, Xue Jiang, Junchi Yan, Xue Yang, and Wayne Zhang. Towards vision-language geo- foundation model: A survey, 2024. 1

work page 2024
[72]

Zeng-Hui Zhu, Wei Lu, Si-Bao Chen, Chris H. Q. Ding, Jin Tang, and Bin Luo. Real-world remote sensing image dehaz- ing: Benchmark and baseline.IEEE Trans. Geosci. Remote Sens., 63:1–14, 2025. 12, 13, 14 11 A. MoRA and softmax mixture approximation This section gives the full tensor definitions behind the com- pact MoT/MoRA update in the main paper. With r...

work page 2025
[73]

Remove the cloud layer to improve visibility of the surface

work page
[74]

Apply SAR technology to mitigate cloud interference

work page
[75]

The dense cloud cover is obstructing the view; remove it for clarity

work page
[76]

Can you enhance the clarity of this image by removing the clouds? Prompt examples HR

work page
[77]

Apply haze removal techniques to reveal the landscape below

work page
[78]

The hazes are blocking the view; please remove them

work page
[79]

Remove the haze from this remote sensing image to improve visibility

work page
[80]

Apply dehazing to this remote sensing image for better interpretation. SR

work page

Showing first 80 references.

[1] [1]

SatlasPretrain: A large-scale dataset for remote sensing image understanding

Favyen Bastani, Piper Wolters, Ritwik Gupta, Joe Ferdinando, and Aniruddha Kembhavi. SatlasPretrain: A large-scale dataset for remote sensing image understanding. InICCV, pages 16726–16736, 2023. 2

work page 2023

[2] [2]

BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models

Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. InACL, pages 1–9, Dublin, Ireland,

work page

[3] [3]

Association for Computational Linguistics. 7

work page

[4] [4]

Unsupervised learn- ing of visual features by contrasting cluster assignments

Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. Unsupervised learn- ing of visual features by contrasting cluster assignments. In NeurIPS, pages 9912–9924, 2020. 2

work page 2020

[5] [5]

Pre-trained image processing transformer

Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, and Wen Gao. Pre-trained image processing transformer. In CVPR, pages 12294–12305, 2021. 2

work page 2021

[6] [6]

Dynamic convolution: Attention over convolution kernels

Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Lu Yuan, and Zicheng Liu. Dynamic convolution: Attention over convolution kernels. InCVPR, pages 11030–11039,

work page

[7] [7]

GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks

Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InICML, pages 794–803. PMLR, 2018. 15

work page 2018

[8] [8]

Trinity-Net: Gradient- guided swin transformer-based remote sensing image dehaz- ing and beyond.IEEE Trans

Kaichen Chi, Yuan Yuan, and Qi Wang. Trinity-Net: Gradient- guided swin transformer-based remote sensing image dehaz- ing and beyond.IEEE Trans. Geosci. Remote Sens., 61:1–14,

work page

[9] [9]

Conde, Gregor Geigle, and Radu Timofte

Marcos V . Conde, Gregor Geigle, and Radu Timofte. In- structIR: High-quality image restoration following human instructions. InECCV, page 1–21, Berlin, Heidelberg, 2024. Springer-Verlag. 1, 2

work page 2024

[10] [10]

Lobell, and Stefano Ermon

Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi, Yutong He, Marshall Burke, David B. Lobell, and Stefano Ermon. SatMAE: Pre-training transformers for tem- poral and multi-spectral satellite imagery. InNeurIPS, Red Hook, NY , USA, 2022. Curran Associates Inc. 2

work page 2022

[11] [11]

Enpowering your pansharpening models with generalizability: Unified distri- bution is all you need

Yongchuan Cui, Peng Liu, and Hui Zhang. Enpowering your pansharpening models with generalizability: Unified distri- bution is all you need. InICCV, pages 11850–11860, 2025. 1

work page 2025

[12] [12]

Sinkhorn distances: Lightspeed computation of optimal transport

Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. InNeurIPS, pages 2292–2300, 2013. 2, 3, 4, 8, 14

work page 2013

[13] [13]

TerraFM: A scalable foundation model for unified multisensor earth observation.arXiv, 2025

Muhammad Sohail Danish, Muhammad Akhtar Munir, Syed Roshaan Ali Shah, Muhammad Haris Khan, Rao Muhammad Anwer, Jorma Laaksonen, Fahad Shahbaz Khan, and Salman Khan. TerraFM: A scalable foundation model for unified multisensor earth observation.arXiv, 2025. 2

work page 2025

[14] [14]

ImageNet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database . InCVPR, pages 248–255, Los Alamitos, CA, USA, 2009. IEEE Computer Society. 2

work page 2009

[15] [15]

Machine learning in pansharpening: A benchmark, from shallow to deep networks.IEEE Geosci

Liang-Jian Deng, Gemine Vivone, Mercedes E Paoletti, Giuseppe Scarpa, Jiang He, Yongjun Zhang, Jocelyn Chanus- sot, and Antonio Plaza. Machine learning in pansharpening: A benchmark, from shallow to deep networks.IEEE Geosci. Remote Sens. Mag., 10(3):279–315, 2022. 12, 13, 14

work page 2022

[16] [16]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InICLR, 2021. 2

work page 2021

[17] [17]

Multisensor Data Fusion for Cloud Removal in Global and All-Season Sentinel-2 Imagery.IEEE Trans

Patrick Ebel, Andrea Meraner, Michael Schmitt, and Xiao Xi- ang Zhu. Multisensor Data Fusion for Cloud Removal in Global and All-Season Sentinel-2 Imagery.IEEE Trans. Geosci. Remote Sens., 59(7):5866–5878, 2021. 12, 13, 14

work page 2021

[18] [18]

Emelyanova, Tim R

Irina V . Emelyanova, Tim R. McVicar, Thomas G. Van Niel, Ling Tao Li, and Albert I.J.M. van Dijk. Assessing the accu- racy of blending landsat–modis surface reflectances in two landscapes with contrasting spatial and temporal dynamics: A framework for algorithm selection.Remote Sens. Environ., 133:193–209, 2013. 12, 13, 14

work page 2013

[19] [19]

Ro- bust SAR image despeckling by deep learning from near-real datasets.IEEE J

Jianjun Guan, Ping Zhong, Fan Zhang, and Yuhan Liu. Ro- bust SAR image despeckling by deep learning from near-real datasets.IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 17:3475–3487, 2024. 12, 13, 14

work page 2024

[20] [20]

SkySense: A multi-modal remote sensing foundation model towards universal inter- pretation for earth observation imagery

Xin Guo, Jiangwei Lao, Bo Dang, Yingying Zhang, Lei Yu, Lixiang Ru, Liheng Zhong, Ziyuan Huang, Kang Wu, Dingx- iang Hu, Huimei He, Jian Wang, Jingdong Chen, Ming Yang, Yongjun Zhang, and Yansheng Li. SkySense: A multi-modal remote sensing foundation model towards universal inter- pretation for earth observation imagery. InCVPR, pages 27662–27673, 2024. 2

work page 2024

[21] [21]

Wasserstein wormhole: Scalable optimal transport distance with transformer

Doron Haviv, Russell Zhang Kunes, Thomas Dougherty, Cas- sandra Burdziak, Tal Nawy, Anna Gilbert, and Dana Pe’er. Wasserstein wormhole: Scalable optimal transport distance with transformer. InICML, pages 17697–17718. PMLR, 2024. 2

work page 2024

[22] [22]

Diffusion models in low-level vision: A survey, 2024

Chunming He, Yuqi Shen, Chengyu Fang, Fengyang Xiao, Longxiang Tang, Yulun Zhang, Wangmeng Zuo, Zhenhua Guo, and Xiu Li. Diffusion models in low-level vision: A survey, 2024. 1

work page 2024

[23] [23]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InCVPR, pages 770–778, 2016. 13

work page 2016

[24] [24]

Parameter-efficient transfer learning for NLP

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for NLP. InICML, pages 2790–2799. PMLR, 2019. 7

work page 2019

[25] [25]

LoRA: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InICLR,

work page

[26] [26]

Single satellite optical imagery dehazing using sar image prior based on conditional generative adversarial net- works

Binghui Huang, Li Zhi, Chao Yang, Fuchun Sun, and Yixu Song. Single satellite optical imagery dehazing using sar image prior based on conditional generative adversarial net- works. InWACV, pages 1806–1813, 2020. 12, 13, 14

work page 2020

[27] [27]

Transformer fusion with optimal transport

Moritz Imfeld, Jacopo Graldi, Marco Giordano, Thomas Hof- mann, Sotiris Anagnostidis, and Sidak Pal Singh. Transformer fusion with optimal transport. InICLR, 2024. 2 9

work page 2024

[28] [28]

Optimal transport aggre- gation for visual place recognition

Sergio Izquierdo and Javier Civera. Optimal transport aggre- gation for visual place recognition. InCVPR, pages 17658– 17668, 2024. 2

work page 2024

[29] [29]

Jacobs, Michael I

Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive mixtures of local experts.Neu- ral Comput., 3(1):79–87, 1991. 3, 5

work page 1991

[30] [30]

All-In-One Image Restoration for Unknown Corruption

Boyun Li, Xiao Liu, Peng Hu, Zhongqin Wu, Jiancheng Lv, and Xi Peng. All-In-One Image Restoration for Unknown Corruption . InCVPR, pages 17431–17441, Los Alamitos, CA, USA, 2022. IEEE Computer Society. 2

work page 2022

[31] [31]

Spatio-temporal fusion for remote sensing data: An overview and new benchmark.Sci

Jun Li, Yunfei Li, Lin He, Jin Chen, and Antonio Plaza. Spatio-temporal fusion for remote sensing data: An overview and new benchmark.Sci. China Inf. Sci., 63(4):140301, 2020. 12, 13, 14

work page 2020

[32] [32]

Tan, and Loong-Fah Cheong

Ruoteng Li, Robby T. Tan, and Loong-Fah Cheong. All in one bad weather removal using architectural search. InCVPR, pages 3172–3182, 2020. 2

work page 2020

[33] [33]

Scaling & shifting your features: a new baseline for efficient model tuning

Dongze Lian, Daquan Zhou, Jiashi Feng, and Xinchao Wang. Scaling & shifting your features: a new baseline for efficient model tuning. InNeurIPS, Red Hook, NY , USA, 2022. Curran Associates Inc. 7

work page 2022

[34] [34]

A remote sensing image dataset for cloud removal, 2019

Daoyu Lin, Guangluan Xu, Xiaoke Wang, Yang Wang, Xian Sun, and Kun Fu. A remote sensing image dataset for cloud removal, 2019. 12, 13, 14

work page 2019

[35] [35]

Conflict-averse gradient descent for multi-task learning

Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. Conflict-averse gradient descent for multi-task learning. InNeurIPS, Red Hook, NY , USA, 2021. Curran Associates Inc. 4, 15

work page 2021

[36] [36]

Dora: weight-decomposed low-rank adaptation

Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. Dora: weight-decomposed low-rank adaptation. InICML. JMLR.org, 2024. 7

work page 2024

[37] [37]

Degae: A new pretraining paradigm for low-level vision

Yihao Liu, Jingwen He, Jinjin Gu, Xiangtao Kong, Yu Qiao, and Chao Dong. Degae: A new pretraining paradigm for low-level vision. InCVPR, pages 23292–23303, 2023. 2

work page 2023

[38] [38]

Ai foundation models in remote sensing: A survey, 2024

Siqi Lu, Junlin Guo, James R Zimmer-Dauphinee, Jordan M Nieusma, Xiao Wang, Parker VanValkenburgh, Steven A Wernke, and Yuankai Huo. Ai foundation models in remote sensing: A survey, 2024. 1

work page 2024

[39] [39]

Gustafsson, Zheng Zhao, Jens Sj¨olund, and Thomas B

Ziwei Luo, Fredrik K. Gustafsson, Zheng Zhao, Jens Sj¨olund, and Thomas B. Sch¨on. Controlling vision-language models for multi-task image restoration. InICLR, 2024. 2

work page 2024

[40] [40]

Visualizing data using t-SNE.J

Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE.J. Mach. Learn. Res., 9(86):2579–2605,

work page

[41] [41]

Ardakani, and Angel D

Armin Mehri, Parichehr B. Ardakani, and Angel D. Sappa. MPRNet: Multi-path residual network for lightweight image super resolution. InWACV, pages 2703–2712, 2021. 6, 13, 16

work page 2021

[42] [42]

A large-scale benchmark data set for evalu- ating pansharpening performance: Overview and implementa- tion.IEEE Geosci

Xiangchao Meng, Yiming Xiong, Feng Shao, Huanfeng Shen, Weiwei Sun, Gang Yang, Qiangqiang Yuan, Randi Fu, and Hongyan Zhang. A large-scale benchmark data set for evalu- ating pansharpening performance: Overview and implementa- tion.IEEE Geosci. Remote Sens. Mag., 9(1):18–52, 2021. 12, 13, 14

work page 2021

[43] [43]

Sen2ven µs, a dataset for the training of sentinel-2 super-resolution algorithms.Data, 7(7):96, 2022

Julien Michel, Juan Vinasco-Salinas, Jordi Inglada, and Olivier Hagolle. Sen2ven µs, a dataset for the training of sentinel-2 super-resolution algorithms.Data, 7(7):96, 2022. 12, 13, 14

work page 2022

[44] [44]

Multi-task learning as a bargaining game

Aviv Navon, Aviv Shamsian, Idan Achituve, Haggai Maron, Kenji Kawaguchi, Gal Chechik, and Ethan Fetaya. Multi-task learning as a bargaining game. InICML, pages 16428–16446. PMLR, 2022. 4, 15

work page 2022

[45] [45]

Learning dual convolutional neural networks for low-level vision

Jinshan Pan, Sifei Liu, Deqing Sun, Jiawei Zhang, Yang Liu, Jimmy Ren, Zechao Li, Jinhui Tang, Huchuan Lu, Yu-Wing Tai, and Ming-Hsuan Yang. Learning dual convolutional neural networks for low-level vision. InCVPR, pages 3070– 3079, 2018. 2

work page 2018

[46] [46]

PromptIR: prompting for all-in-one blind image restoration

Vaishnav Potlapalli, Syed Waqas Zamir, Salman Khan, and Fahad Shahbaz Khan. PromptIR: prompting for all-in-one blind image restoration. InNeurIPS, Red Hook, NY , USA,

work page

[47] [47]

2, 6, 7, 13, 16

Curran Associates Inc. 2, 6, 7, 13, 16

work page

[48] [48]

U-Net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional networks for biomedical image segmentation. InMICCAI, pages 234–241, Cham, 2015. Springer Interna- tional Publishing. 3, 13

work page 2015

[49] [49]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, 2019

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, 2019. 13

work page 2019

[50] [50]

SuperGlue: Learning feature match- ing with graph neural networks

Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. SuperGlue: Learning feature match- ing with graph neural networks. InCVPR, pages 4938–4947,

work page

[51] [51]

Multi-task learning as multi- objective optimization

Ozan Sener and Vladlen Koltun. Multi-task learning as multi- objective optimization. InNeurIPS, page 525–536, Red Hook, NY , USA, 2018. Curran Associates Inc. 15

work page 2018

[52] [52]

Concerning nonnegative matrices and doubly stochastic matrices.Pac

Richard Sinkhorn and Paul Knopp. Concerning nonnegative matrices and doubly stochastic matrices.Pac. J. Math., 21(2): 343–348, 1967. 2, 3, 4, 8, 14

work page 1967

[53] [53]

Diffusion enhancement for cloud removal in ultra-resolution remote sensing imagery.IEEE Trans

Jialu Sui, Yiyang Ma, Wenhan Yang, Xiaokang Zhang, Man- On Pun, and Jiaying Liu. Diffusion enhancement for cloud removal in ultra-resolution remote sensing imagery.IEEE Trans. Geosci. Remote Sens., 62:1–14, 2024. 12, 13, 14

work page 2024

[54] [54]

RingMo: A remote sensing foundation model with masked image modeling.IEEE Trans

Xian Sun, Peijin Wang, Wanxuan Lu, Zicong Zhu, Xiao- nan Lu, Qibin He, Junxi Li, Xuee Rong, Zhujun Yang, Hao Chang, Qinglin He, Guang Yang, Ruiping Wang, Jiwen Lu, and Kun Fu. RingMo: A remote sensing foundation model with masked image modeling.IEEE Trans. Geosci. Remote Sens., 61:1–22, 2023. 2

work page 2023

[55] [55]

Jeya Maria Jose Valanarasu, Rajeev Yasarla, and Vishal M. Patel. Transweather: Transformer-based restoration of images degraded by adverse weather conditions. InCVPR, pages 2353–2363, 2022. 2

work page 2022

[56] [56]

Labeled dataset for training despeckling filters for SAR imagery.Data Brief., 53:110065, 2024

Rub´en Dar´ıo V´asquez-Salazar, Ahmed Alejandro Cardona- Mesa, Luis G´omez, Carlos M Travieso-Gonz´alez, Andr´es F Garavito-Gonz´alez, and Esteban V ´asquez-Cano. Labeled dataset for training despeckling filters for SAR imagery.Data Brief., 53:110065, 2024. 12, 13, 14

work page 2024

[57] [57]

Gomez, Łukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNeurIPS, pages 6000–6010, Red Hook, NY , USA, 2017. Curran Associates Inc. 2

work page 2017

[58] [58]

Multisensor remote sensing 10 imagery super-resolution with conditional gan.J

Junwei Wang, Kun Gao, Zhenzhou Zhang, Chong Ni, Zibo Hu, Dayu Chen, and Qiong Wu. Multisensor remote sensing 10 imagery super-resolution with conditional gan.J. Remote Sens., 2021, 2021. 12, 13, 14

work page 2021

[59] [59]

GridFormer: Residual dense transformer with grid structure for image restoration in adverse weather conditions.IJCV, 132(10):4541–4563, 2024

Tao Wang, Kaihao Zhang, Ziqian Shao, Wenhan Luo, Bjorn Stenger, Tong Lu, Tae-Kyun Kim, Wei Liu, and Hongdong Li. GridFormer: Residual dense transformer with grid structure for image restoration in adverse weather conditions.IJCV, 132(10):4541–4563, 2024. 6, 13, 16

work page 2024

[60] [60]

Gradient as conditions: Rethinking HOG for all-in-one image restoration

Jiawei Wu, Zhifei Yang, Zhe Wang, and Zhi Jin. Gradient as conditions: Rethinking HOG for all-in-one image restoration. AAAI, 40(13):10682–10690, 2026. 6, 13, 16

work page 2026

[61] [61]

mHC: Manifold-constrained hyper-connections.arXiv, 2025

Zhenda Xie, Yixuan Wei, Huanqi Cao, Chenggang Zhao, Chengqi Deng, Jiashi Li, Damai Dai, Huazuo Gao, Jiang Chang, Kuai Yu, Liang Zhao, Shangyan Zhou, Zhean Xu, Zhengyan Zhang, Wangding Zeng, Shengding Hu, Yuqing Wang, Jingyang Yuan, Lean Wang, and Wenfeng Liang. mHC: Manifold-constrained hyper-connections.arXiv, 2025. 2

work page 2025

[62] [62]

Condconv: Conditionally parameterized convolutions for efficient inference

Brandon Yang, Gabriel Bender, Quoc V Le, and Jiquan Ngiam. Condconv: Conditionally parameterized convolutions for efficient inference. InNeurIPS, 2019. 4

work page 2019

[63] [63]

mHC-lite: You Don’t Need 20 Sinkhorn-Knopp Iterations.arXiv, 2026

Yongyi Yang and Jianyang Gao. mHC-lite: You Don’t Need 20 Sinkhorn-Knopp Iterations.arXiv, 2026. 2

work page 2026

[64] [64]

All-In-One Medical Image Restoration via Task-Adaptive Routing

Zhiwen Yang, Haowei Chen, Ziniu Qian, Yang Yi, Hui Zhang, Dan Zhao, Bingzheng Wei, and Yan Xu. All-In-One Medical Image Restoration via Task-Adaptive Routing . InMICCAI. Springer Nature Switzerland, 2024. 6, 7, 13, 16

work page 2024

[65] [65]

Gradient surgery for multi-task learning

Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. Gradient surgery for multi-task learning. InNeurIPS, Red Hook, NY , USA, 2020. Curran Associates Inc. 4, 15

work page 2020

[66] [66]

Z. Yuan, Z. Xiong, L. Mou, and X. X. Zhu. Chatearthnet: a global-scale image–text dataset empowering vision–language geo-foundation models.Earth Syst. Sci. Data, 17(3):1245– 1263, 2025. 2

work page 2025

[67] [67]

Com- plexity experts are task-discriminative learners for any image restoration

Eduard Zamfir, Zongwei Wu, Nancy Mehta, Yuedong Tan, Danda Pani Paudel, Yulun Zhang, and Radu Timofte. Com- plexity experts are task-discriminative learners for any image restoration. InCVPR, pages 12753–12763, 2025. 6, 7, 13, 16

work page 2025

[68] [68]

Restormer: Efficient transformer for high-resolution image restoration

Syed Waqas Zamir, Aditya Arora, Salman Khan, Mu- nawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. InCVPR, pages 5728–5739, 2022. 6, 7, 13, 16

work page 2022

[69] [69]

Dense haze removal based on dynamic collaborative inference learning for remote sensing images.IEEE Trans

Libao Zhang and Shan Wang. Dense haze removal based on dynamic collaborative inference learning for remote sensing images.IEEE Trans. Geosci. Remote Sens., 60:1–16, 2022. 12, 13, 14

work page 2022

[70] [70]

RS5M and GeoRSCLIP: A large scale vision-language dataset and a large vision-language model for remote sensing.IEEE Trans

Zilun Zhang, Tiancheng Zhao, Yulong Guo, and Jianwei Yin. RS5M and GeoRSCLIP: A large scale vision-language dataset and a large vision-language model for remote sensing.IEEE Trans. Geosci. Remote Sens., 62:1–23, 2024. 2

work page 2024

[71] [71]

Towards vision-language geo- foundation model: A survey, 2024

Yue Zhou, Litong Feng, Yiping Ke, Xue Jiang, Junchi Yan, Xue Yang, and Wayne Zhang. Towards vision-language geo- foundation model: A survey, 2024. 1

work page 2024

[72] [72]

Zeng-Hui Zhu, Wei Lu, Si-Bao Chen, Chris H. Q. Ding, Jin Tang, and Bin Luo. Real-world remote sensing image dehaz- ing: Benchmark and baseline.IEEE Trans. Geosci. Remote Sens., 63:1–14, 2025. 12, 13, 14 11 A. MoRA and softmax mixture approximation This section gives the full tensor definitions behind the com- pact MoT/MoRA update in the main paper. With r...

work page 2025

[73] [73]

Remove the cloud layer to improve visibility of the surface

work page

[74] [74]

Apply SAR technology to mitigate cloud interference

work page

[75] [75]

The dense cloud cover is obstructing the view; remove it for clarity

work page

[76] [76]

Can you enhance the clarity of this image by removing the clouds? Prompt examples HR

work page

[77] [77]

Apply haze removal techniques to reveal the landscape below

work page

[78] [78]

The hazes are blocking the view; please remove them

work page

[79] [79]

Remove the haze from this remote sensing image to improve visibility

work page

[80] [80]

Apply dehazing to this remote sensing image for better interpretation. SR

work page