Remote Sensing Image Super-Resolution for Imbalanced Textures: A Texture-Aware Diffusion Framework

Dilxat Muhtar; Enzhuo Zhang; Pengfeng Xiao; Sijie Zhao; Xueliang Zhang; Zhenshi Li

arxiv: 2604.13994 · v1 · submitted 2026-04-15 · 💻 cs.CV

Remote Sensing Image Super-Resolution for Imbalanced Textures: A Texture-Aware Diffusion Framework

Enzhuo Zhang , Sijie Zhao , Dilxat Muhtar , Zhenshi Li , Xueliang Zhang , Pengfeng Xiao This is my paper

Pith reviewed 2026-05-10 13:19 UTC · model grok-4.3

classification 💻 cs.CV

keywords remote sensing image super-resolutiondiffusion modelstexture-aware processingimbalanced texturesrelative texture density maphigh-frequency detail preservation

0 comments

The pith

A relative texture density map conditions diffusion models to handle imbalanced textures in remote sensing super-resolution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to fix the fact that diffusion-based super-resolution works well on ordinary photos but struggles with remote sensing images, which show textures that appear random from far away yet form tight local clusters. The authors first compute a Relative Texture Density Map that records how much texture each small region contains compared with the whole image. They then insert this map into the diffusion pipeline in three linked ways: as extra input that tells the model where texture should appear, as a weight that makes the training loss pay more attention to detailed patches, and as a control that speeds up or slows down the generation steps in different areas. A reader who accepts this should expect the output images to keep genuine fine details while inventing fewer false texture patterns, which in turn improves accuracy when those images are fed into later tasks such as object detection or land-use mapping.

Core claim

TexADiff estimates a Relative Texture Density Map to represent the unique stochastic-yet-clustered texture distribution of remote sensing images. It then applies this map as explicit spatial conditioning to guide the diffusion process, as a loss modulation term to prioritize texture-rich regions, and as a dynamic adapter for the sampling schedule. These three modifications together give the model explicit texture-aware capabilities.

What carries the argument

The Relative Texture Density Map (RTDM), a per-pixel field that measures local texture concentration relative to the global image, integrated via spatial conditioning, loss re-weighting, and schedule adaptation.

If this is right

The method records superior or competitive scores on standard quantitative metrics for remote sensing super-resolution.
Generated images show high-frequency details that match the original scene more closely and contain fewer invented texture patterns.
Super-resolved outputs produce measurable gains when used as input to downstream remote sensing tasks such as classification or detection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same density-map idea could be tried on other domains that also contain clustered irregular textures, such as certain medical or aerial photography datasets.
If the map can be estimated reliably from low-resolution input alone, the approach might allow smaller training sets by directing model capacity only to the regions that need it.
One could test whether the three integration strategies remain additive when the base diffusion model is replaced by a newer architecture.

Load-bearing premise

That the estimated Relative Texture Density Map reliably captures the actual imbalanced texture distribution in remote sensing images and that the three integration strategies work together without introducing new biases or artifacts.

What would settle it

Apply the full TexADiff pipeline to a test set of remote sensing images whose Relative Texture Density Map has been deliberately flattened to a uniform value; if the quantitative scores and hallucination rate then match those of an unmodified diffusion baseline, the claim that the map is doing essential work would be falsified.

Figures

Figures reproduced from arXiv: 2604.13994 by Dilxat Muhtar, Enzhuo Zhang, Pengfeng Xiao, Sijie Zhao, Xueliang Zhang, Zhenshi Li.

**Figure 2.** Figure 2: RTDM derived directly from the LR-HR pair during [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Architecture of proposed TexADiff. During training, the extracted RTDM is combined with the LR input and noisy latent via a [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Image SR results (×4) on the synthetic scenario. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Image SR results (×4) on the Real-World scenario. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

Generative diffusion priors have recently achieved state-of-the-art performance in natural image super-resolution, demonstrating a powerful capability to synthesize photorealistic details. However, their direct application to remote sensing image super-resolution (RSISR) reveals significant shortcomings. Unlike natural images, remote sensing images exhibit a unique texture distribution where ground objects are globally stochastic yet locally clustered, leading to highly imbalanced textures. This imbalance severely hinders the model's spatial perception. To address this, we propose TexADiff, a novel framework that begins by estimating a Relative Texture Density Map (RTDM) to represent the texture distribution. TexADiff then leverages this RTDM in three synergistic ways: as an explicit spatial conditioning to guide the diffusion process, as a loss modulation term to prioritize texture-rich regions, and as a dynamic adapter for the sampling schedule. These modifications are designed to endow the model with explicit texture-aware capabilities. Experiments demonstrate that TexADiff achieves superior or competitive quantitative metrics. Furthermore, qualitative results show that our model generates faithful high-frequency details while effectively suppressing texture hallucinations. This improved reconstruction quality also results in significant gains in downstream task performance. The source code of our method can be found at https://github.com/ZezFuture/TexAdiff.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TexADiff adds a relative texture density map to diffusion super-resolution to handle the clustered, imbalanced textures common in remote sensing images.

read the letter

The key point is that this paper estimates a Relative Texture Density Map from the low-resolution input and plugs it into the diffusion pipeline three ways: as spatial conditioning, as a term that modulates the training loss toward texture-rich areas, and as a way to adjust the sampling schedule dynamically. That combination targets a real difference between natural images and remote sensing data, where textures are globally random but locally grouped, which standard diffusion SR models tend to mishandle by either smoothing details or hallucinating structures that do not fit the scene statistics. The authors argue this explicit texture awareness improves high-frequency fidelity and downstream task performance, and they release code, which is useful for verification. What stands out is the direct attack on the domain gap rather than treating remote sensing as just another dataset. The problem statement is clear and the three integration points follow logically from the stated texture imbalance. On the soft spots, the abstract gives no concrete numbers or baseline comparisons, so the size of the gains and whether they hold across sensors or scenes still needs checking in the full experiments. The map estimation step itself could be sensitive to input noise or resolution changes typical in aerial data, and it is not yet obvious from the description whether all three uses are required or if one or two carry most of the benefit. Ablations would clarify that. Readers working on remote sensing enhancement or on adapting generative models to data with strong spatial structure will get the most from this. The method is coherent enough and the underlying issue is legitimate, so the paper deserves a serious referee even if the results section will likely need tightening. I would send it out for review.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes TexADiff, a texture-aware diffusion framework for remote sensing image super-resolution. It first estimates a Relative Texture Density Map (RTDM) to capture the globally stochastic yet locally clustered texture distribution characteristic of remote sensing images. The RTDM is then integrated in three synergistic ways: as explicit spatial conditioning to guide the diffusion process, as a loss modulation term to prioritize texture-rich regions, and as a dynamic adapter for the sampling schedule. Experiments are reported to show superior or competitive quantitative metrics, faithful high-frequency details with suppressed texture hallucinations, and significant gains in downstream task performance. The source code is released.

Significance. If the experimental claims hold, the work addresses a domain-specific challenge in applying diffusion priors to remote sensing super-resolution by explicitly modeling texture imbalance, which is not as pronounced in natural images. This could lead to improved reconstruction quality for applications such as land-use classification or object detection from satellite imagery. The public code release is a positive factor supporting reproducibility.

minor comments (3)

The abstract claims 'superior or competitive quantitative metrics' and 'significant gains in downstream task performance' without reporting specific values (e.g., PSNR, SSIM, or task accuracy deltas), baselines, or dataset names. Adding these details would make the summary more informative.
The three integration strategies for the RTDM (spatial conditioning, loss modulation, sampling adapter) are described at a high level; a figure or pseudocode illustrating their exact placement within the diffusion U-Net and sampling loop would improve clarity.
The paper should explicitly state the remote sensing datasets used for training and testing, along with the super-resolution factors evaluated, to allow direct comparison with prior RSISR methods.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of TexADiff, the recognition of its novelty in handling texture imbalance for remote sensing super-resolution, and the recommendation for minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces TexADiff as a new diffusion-based framework for remote sensing image super-resolution. It defines a Relative Texture Density Map (RTDM) estimation step and then describes three explicit integration strategies (spatial conditioning, loss modulation, and dynamic sampling adapter) that are presented as novel modifications to address texture imbalance. No load-bearing step reduces by construction to a fitted parameter, self-citation, or renamed prior result; the central claims rest on experimental metrics and qualitative evaluation rather than any closed mathematical derivation that loops back to its own inputs. The method is self-contained against external benchmarks and code is released for verification.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Limited to abstract; main addition is the invented RTDM entity and its integration into standard diffusion priors. No free parameters or ad-hoc axioms are detailed.

axioms (1)

domain assumption Generative diffusion priors can synthesize photorealistic details in natural images
Stated as recent state-of-the-art achievement in the abstract.

invented entities (1)

Relative Texture Density Map (RTDM) no independent evidence
purpose: Represent the texture distribution in remote sensing images to guide diffusion
Newly proposed to address global stochastic yet locally clustered textures

pith-pipeline@v0.9.0 · 5543 in / 1284 out tokens · 32023 ms · 2026-05-10T13:19:29.510394+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

[1]

Dream- clear: High-capacity real-world image restoration with privacy-safe dataset curation

Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Xiaotian Han, Zhengyu Chen, Quanzeng You, and Hongxia Yang. Dream- clear: High-capacity real-world image restoration with privacy-safe dataset curation. InNeurIPS, 2024. 2

work page 2024
[2]

Blind super-resolution kernel estimation using an internal-gan

Sefi Bell-Kligler, Assaf Shocher, and Michal Irani. Blind super-resolution kernel estimation using an internal-gan. In NeurIPS, 2019. 2

work page 2019
[3]

Pixart-α: Fast training of diffusion trans- former for photorealistic text-to-image synthesis

Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, and Zhenguo Li. Pixart-α: Fast training of diffusion trans- former for photorealistic text-to-image synthesis. InICLR,

work page
[4]

FaithD- iff: Unleashing diffusion priors for faithful image super- resolution

Junyang Chen, Jinshan Pan, and Jiangxin Dong. FaithD- iff: Unleashing diffusion priors for faithful image super- resolution. InCVPR, 2025. 2, 5, 7

work page 2025
[5]

Encoder-decoder with atrous separable convolution for semantic image segmentation

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018. 7

work page 2018
[6]

Shenglong Chen, Yoshiki Ogawa, Chenbo Zhao, and Yoshi- hide Sekimoto. Large-scale individual building extrac- tion from open-source satellite imagery via super-resolution- based instance segmentation approach.ISPRS Journal of Photogrammetry and Remote Sensing, 2023. 1

work page 2023
[7]

MMSegmentation: Openmmlab semantic segmentation toolbox and benchmark.https : / / github

MMSegmentation Contributors. MMSegmentation: Openmmlab semantic segmentation toolbox and benchmark.https : / / github . com / open - mmlab/mmsegmentation, 2020. 7

work page 2020
[8]

Image quality assessment: Unifying structure and texture similarity.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 2020

Keyan Ding, Kede Ma, Shiqi Wang, and Eero P Simoncelli. Image quality assessment: Unifying structure and texture similarity.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 2020. 5

work page 2020
[9]

Learning a deep convolutional network for image super-resolution

Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Learning a deep convolutional network for image super-resolution. InECCV, 2014. 2

work page 2014
[10]

Auto-encoding varia- tional bayes

Diederik P Kingma and Max Welling. Auto-encoding varia- tional bayes. InICLR, 2014. 4

work page 2014
[11]

Photo- realistic single image super-resolution using a generative ad- versarial network

Christian Ledig, Lucas Theis, Ferenc Husz´ar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo- realistic single image super-resolution using a generative ad- versarial network. InCVPR, 2017. 2

work page 2017
[12]

Srdiff: Single image super-resolution with diffusion probabilistic models

Haoying Li, Yifan Yang, Meng Chang, Shiqi Chen, Huajun Feng, Zhihai Xu, Qi Li, and Yueting Chen. Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing, 2022. 2

work page 2022
[13]

SwinIR: Image restoration using swin transformer

Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. SwinIR: Image restoration using swin transformer. InICCV, 2021. 2

work page 2021
[14]

Enhanced deep residual networks for single image super-resolution

Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution. InCVPR Workshops, 2017. 2

work page 2017
[15]

Super-resolution-based change detection network with stacked attention module for images with different resolutions.IEEE Transactions on Geoscience and Remote Sensing, 2021

Mengxi Liu, Qian Shi, Andrea Marinoni, Da He, Xiaoping Liu, and Liangpei Zhang. Super-resolution-based change detection network with stacked attention module for images with different resolutions.IEEE Transactions on Geoscience and Remote Sensing, 2021. 1

work page 2021
[16]

Yang Long, Gui-Song Xia, Shengyang Li, Wen Yang, Michael Ying Yang, Xiao Xiang Zhu, Liangpei Zhang, and Deren Li. On creating benchmark dataset for aerial image interpretation: Reviews, guidances, and million-aid.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021. 5

work page 2021
[17]

RePaint: Inpainting using denoising diffusion probabilistic models

Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. RePaint: Inpainting using denoising diffusion probabilistic models. InCVPR,

work page
[18]

No-reference image quality assessment in the spa- tial domain.IEEE Transactions on Image Processing, 2012

Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik. No-reference image quality assessment in the spa- tial domain.IEEE Transactions on Image Processing, 2012. 5

work page 2012
[19]

completely blind

Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Mak- ing a “completely blind” image quality analyzer.IEEE Sig- nal Processing Letters, 2012. 5

work page 2012
[20]

Controlnext: Powerful and effi- cient control for image and video generation.arXiv preprint arXiv:2408.06070, 2024

Bohao Peng, Jian Wang, Yuechen Zhang, Wenbo Li, Ming- Chang Yang, and Jiaya Jia. ControlNext: Powerful and effi- cient control for image and video generation.arXiv preprint arXiv:2408.06070, 2024. 2, 4

work page arXiv 2024
[21]

SDXL: Improving latent diffusion mod- els for high-resolution image synthesis

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M ¨uller, Joe Penna, and Robin Rombach. SDXL: Improving latent diffusion mod- els for high-resolution image synthesis. InICLR, 2024. 1, 7

work page 2024
[22]

Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network

Wenzhe Shi, Jose Caballero, Ferenc Husz ´ar, Johannes Totz, Andrew P Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR, 2016. 2

work page 2016
[23]

CoSeR: Bridging image and language for cognitive super-resolution

Haoze Sun, Wenbo Li, Jianzhuang Liu, Haoyu Chen, Ren- jing Pei, Xueyi Zou, Youliang Yan, and Yujiu Yang. CoSeR: Bridging image and language for cognitive super-resolution. InCVPR, 2024. 2

work page 2024
[24]

Semantic guided large scale factor remote sensing image super-resolution with generative dif- fusion prior.ISPRS Journal of Photogrammetry and Remote Sensing, 2025

Ce Wang and Wanjie Sun. Semantic guided large scale factor remote sensing image super-resolution with generative dif- fusion prior.ISPRS Journal of Photogrammetry and Remote Sensing, 2025. 2

work page 2025
[25]

Loveda: A remote sensing land-cover dataset for domain adaptive semantic segmentation.arXiv preprint arXiv:2110.08733, 2021

Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zhong. LoveDA: A remote sensing land-cover dataset for domain adaptive semantic segmentation.arXiv preprint arXiv:2110.08733, 2021. 4, 5, 7

work page arXiv 2021
[26]

Ex- ploring clip for assessing the look and feel of images

Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Ex- ploring clip for assessing the look and feel of images. In AAAI, 2023. 5

work page 2023
[27]

Exploiting diffusion prior for real-world image super-resolution.International Journal of Computer Vision, 2024

Jianyi Wang, Zongsheng Yue, Shangchen Zhou, Kelvin CK Chan, and Chen Change Loy. Exploiting diffusion prior for real-world image super-resolution.International Journal of Computer Vision, 2024. 2

work page 2024
[28]

Recovering realistic texture in image super-resolution by deep spatial feature transform

Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. Recovering realistic texture in image super-resolution by deep spatial feature transform. InCVPR, 2018. 4

work page 2018
[29]

ESRGAN: 9 Enhanced super-resolution generative adversarial networks

Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. ESRGAN: 9 Enhanced super-resolution generative adversarial networks. InECCV Workshops, 2018. 2

work page 2018
[30]

Real-ESRGAN: Training real-world blind super-resolution with pure synthetic data

Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-ESRGAN: Training real-world blind super-resolution with pure synthetic data. InICCV Workshops, 2021. 2, 5

work page 2021
[31]

SinSR: diffusion-based image super- resolution in a single step

Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C Kot, and Bihan Wen. SinSR: diffusion-based image super- resolution in a single step. InCVPR, 2024. 2

work page 2024
[32]

AID: A benchmark data set for performance evaluation of aerial scene classification.IEEE Transactions on Geoscience and Remote Sensing, 2017

Gui-Song Xia, Jingwen Hu, Fan Hu, Baoguang Shi, Xiang Bai, Yanfei Zhong, Liangpei Zhang, and Xiaoqiang Lu. AID: A benchmark data set for performance evaluation of aerial scene classification.IEEE Transactions on Geoscience and Remote Sensing, 2017. 4, 5, 7

work page 2017
[33]

DOTA: A large-scale dataset for object detection in aerial images

Gui-Song Xia, Xiang Bai, Jian Ding, Zhen Zhu, Serge Be- longie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, and Liang- pei Zhang. DOTA: A large-scale dataset for object detection in aerial images. InCVPR, 2018. 4, 5

work page 2018
[34]

Pixel-aware stable diffusion for realistic im- age super-resolution and personalized stylization

Tao Yang, Rongyuan Wu, Peiran Ren, Xuansong Xie, and Lei Zhang. Pixel-aware stable diffusion for realistic im- age super-resolution and personalized stylization. InECCV,

work page
[35]

Scaling up to excellence: Practicing model scaling for photo- realistic image restoration in the wild

Fanghua Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, and Chao Dong. Scaling up to excellence: Practicing model scaling for photo- realistic image restoration in the wild. InCVPR, 2024. 2

work page 2024
[36]

Arbitrary-steps image super-resolution via diffusion inver- sion

Zongsheng Yue, Kang Liao, and Chen Change Loy. Arbitrary-steps image super-resolution via diffusion inver- sion. InCVPR, 2025. 2

work page 2025
[37]

Effi- cient diffusion model for image restoration by residual shift- ing.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Zongsheng Yue, Jianyi Wang, and Chen Change Loy. Effi- cient diffusion model for image restoration by residual shift- ing.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 2, 5

work page 2025
[38]

SuperYOLO: Super resolution as- sisted object detection in multimodal remote sensing im- agery.IEEE Transactions on Geoscience and Remote Sens- ing, 2023

Jiaqing Zhang, Jie Lei, Weiying Xie, Zhenman Fang, Yun- song Li, and Qian Du. SuperYOLO: Super resolution as- sisted object detection in multimodal remote sensing im- agery.IEEE Transactions on Geoscience and Remote Sens- ing, 2023. 1

work page 2023
[39]

Designing a practical degradation model for deep blind image super-resolution

Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timo- fte. Designing a practical degradation model for deep blind image super-resolution. InICCV, 2021. 2

work page 2021
[40]

Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In ICCV, 2023. 2, 4

work page 2023
[41]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InCVPR, 2018. 3, 5

work page 2018
[42]

Dirichlet-derived multiple topic scene classification model for high spatial resolution remote sensing imagery.IEEE Transactions on Geoscience and Remote Sensing, 2015

Bei Zhao, Yanfei Zhong, Gui-Song Xia, and Liangpei Zhang. Dirichlet-derived multiple topic scene classification model for high spatial resolution remote sensing imagery.IEEE Transactions on Geoscience and Remote Sensing, 2015. 5

work page 2015
[43]

Feature significance-based multibag-of-visual-words model for re- mote sensing image scene classification.Journal of Applied Remote Sensing, 2016

Lijun Zhao, Ping Tang, and Lianzhi Huo. Feature significance-based multibag-of-visual-words model for re- mote sensing image scene classification.Journal of Applied Remote Sensing, 2016. 4, 5 10

work page 2016

[1] [1]

Dream- clear: High-capacity real-world image restoration with privacy-safe dataset curation

Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Xiaotian Han, Zhengyu Chen, Quanzeng You, and Hongxia Yang. Dream- clear: High-capacity real-world image restoration with privacy-safe dataset curation. InNeurIPS, 2024. 2

work page 2024

[2] [2]

Blind super-resolution kernel estimation using an internal-gan

Sefi Bell-Kligler, Assaf Shocher, and Michal Irani. Blind super-resolution kernel estimation using an internal-gan. In NeurIPS, 2019. 2

work page 2019

[3] [3]

Pixart-α: Fast training of diffusion trans- former for photorealistic text-to-image synthesis

Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, and Zhenguo Li. Pixart-α: Fast training of diffusion trans- former for photorealistic text-to-image synthesis. InICLR,

work page

[4] [4]

FaithD- iff: Unleashing diffusion priors for faithful image super- resolution

Junyang Chen, Jinshan Pan, and Jiangxin Dong. FaithD- iff: Unleashing diffusion priors for faithful image super- resolution. InCVPR, 2025. 2, 5, 7

work page 2025

[5] [5]

Encoder-decoder with atrous separable convolution for semantic image segmentation

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018. 7

work page 2018

[6] [6]

Shenglong Chen, Yoshiki Ogawa, Chenbo Zhao, and Yoshi- hide Sekimoto. Large-scale individual building extrac- tion from open-source satellite imagery via super-resolution- based instance segmentation approach.ISPRS Journal of Photogrammetry and Remote Sensing, 2023. 1

work page 2023

[7] [7]

MMSegmentation: Openmmlab semantic segmentation toolbox and benchmark.https : / / github

MMSegmentation Contributors. MMSegmentation: Openmmlab semantic segmentation toolbox and benchmark.https : / / github . com / open - mmlab/mmsegmentation, 2020. 7

work page 2020

[8] [8]

Image quality assessment: Unifying structure and texture similarity.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 2020

Keyan Ding, Kede Ma, Shiqi Wang, and Eero P Simoncelli. Image quality assessment: Unifying structure and texture similarity.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 2020. 5

work page 2020

[9] [9]

Learning a deep convolutional network for image super-resolution

Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Learning a deep convolutional network for image super-resolution. InECCV, 2014. 2

work page 2014

[10] [10]

Auto-encoding varia- tional bayes

Diederik P Kingma and Max Welling. Auto-encoding varia- tional bayes. InICLR, 2014. 4

work page 2014

[11] [11]

Photo- realistic single image super-resolution using a generative ad- versarial network

Christian Ledig, Lucas Theis, Ferenc Husz´ar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo- realistic single image super-resolution using a generative ad- versarial network. InCVPR, 2017. 2

work page 2017

[12] [12]

Srdiff: Single image super-resolution with diffusion probabilistic models

Haoying Li, Yifan Yang, Meng Chang, Shiqi Chen, Huajun Feng, Zhihai Xu, Qi Li, and Yueting Chen. Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing, 2022. 2

work page 2022

[13] [13]

SwinIR: Image restoration using swin transformer

Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. SwinIR: Image restoration using swin transformer. InICCV, 2021. 2

work page 2021

[14] [14]

Enhanced deep residual networks for single image super-resolution

Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution. InCVPR Workshops, 2017. 2

work page 2017

[15] [15]

Super-resolution-based change detection network with stacked attention module for images with different resolutions.IEEE Transactions on Geoscience and Remote Sensing, 2021

Mengxi Liu, Qian Shi, Andrea Marinoni, Da He, Xiaoping Liu, and Liangpei Zhang. Super-resolution-based change detection network with stacked attention module for images with different resolutions.IEEE Transactions on Geoscience and Remote Sensing, 2021. 1

work page 2021

[16] [16]

Yang Long, Gui-Song Xia, Shengyang Li, Wen Yang, Michael Ying Yang, Xiao Xiang Zhu, Liangpei Zhang, and Deren Li. On creating benchmark dataset for aerial image interpretation: Reviews, guidances, and million-aid.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021. 5

work page 2021

[17] [17]

RePaint: Inpainting using denoising diffusion probabilistic models

Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. RePaint: Inpainting using denoising diffusion probabilistic models. InCVPR,

work page

[18] [18]

No-reference image quality assessment in the spa- tial domain.IEEE Transactions on Image Processing, 2012

Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik. No-reference image quality assessment in the spa- tial domain.IEEE Transactions on Image Processing, 2012. 5

work page 2012

[19] [19]

completely blind

Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Mak- ing a “completely blind” image quality analyzer.IEEE Sig- nal Processing Letters, 2012. 5

work page 2012

[20] [20]

Controlnext: Powerful and effi- cient control for image and video generation.arXiv preprint arXiv:2408.06070, 2024

Bohao Peng, Jian Wang, Yuechen Zhang, Wenbo Li, Ming- Chang Yang, and Jiaya Jia. ControlNext: Powerful and effi- cient control for image and video generation.arXiv preprint arXiv:2408.06070, 2024. 2, 4

work page arXiv 2024

[21] [21]

SDXL: Improving latent diffusion mod- els for high-resolution image synthesis

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M ¨uller, Joe Penna, and Robin Rombach. SDXL: Improving latent diffusion mod- els for high-resolution image synthesis. InICLR, 2024. 1, 7

work page 2024

[22] [22]

Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network

Wenzhe Shi, Jose Caballero, Ferenc Husz ´ar, Johannes Totz, Andrew P Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR, 2016. 2

work page 2016

[23] [23]

CoSeR: Bridging image and language for cognitive super-resolution

Haoze Sun, Wenbo Li, Jianzhuang Liu, Haoyu Chen, Ren- jing Pei, Xueyi Zou, Youliang Yan, and Yujiu Yang. CoSeR: Bridging image and language for cognitive super-resolution. InCVPR, 2024. 2

work page 2024

[24] [24]

Semantic guided large scale factor remote sensing image super-resolution with generative dif- fusion prior.ISPRS Journal of Photogrammetry and Remote Sensing, 2025

Ce Wang and Wanjie Sun. Semantic guided large scale factor remote sensing image super-resolution with generative dif- fusion prior.ISPRS Journal of Photogrammetry and Remote Sensing, 2025. 2

work page 2025

[25] [25]

Loveda: A remote sensing land-cover dataset for domain adaptive semantic segmentation.arXiv preprint arXiv:2110.08733, 2021

Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zhong. LoveDA: A remote sensing land-cover dataset for domain adaptive semantic segmentation.arXiv preprint arXiv:2110.08733, 2021. 4, 5, 7

work page arXiv 2021

[26] [26]

Ex- ploring clip for assessing the look and feel of images

Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Ex- ploring clip for assessing the look and feel of images. In AAAI, 2023. 5

work page 2023

[27] [27]

Exploiting diffusion prior for real-world image super-resolution.International Journal of Computer Vision, 2024

Jianyi Wang, Zongsheng Yue, Shangchen Zhou, Kelvin CK Chan, and Chen Change Loy. Exploiting diffusion prior for real-world image super-resolution.International Journal of Computer Vision, 2024. 2

work page 2024

[28] [28]

Recovering realistic texture in image super-resolution by deep spatial feature transform

Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. Recovering realistic texture in image super-resolution by deep spatial feature transform. InCVPR, 2018. 4

work page 2018

[29] [29]

ESRGAN: 9 Enhanced super-resolution generative adversarial networks

Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. ESRGAN: 9 Enhanced super-resolution generative adversarial networks. InECCV Workshops, 2018. 2

work page 2018

[30] [30]

Real-ESRGAN: Training real-world blind super-resolution with pure synthetic data

Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-ESRGAN: Training real-world blind super-resolution with pure synthetic data. InICCV Workshops, 2021. 2, 5

work page 2021

[31] [31]

SinSR: diffusion-based image super- resolution in a single step

Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C Kot, and Bihan Wen. SinSR: diffusion-based image super- resolution in a single step. InCVPR, 2024. 2

work page 2024

[32] [32]

AID: A benchmark data set for performance evaluation of aerial scene classification.IEEE Transactions on Geoscience and Remote Sensing, 2017

Gui-Song Xia, Jingwen Hu, Fan Hu, Baoguang Shi, Xiang Bai, Yanfei Zhong, Liangpei Zhang, and Xiaoqiang Lu. AID: A benchmark data set for performance evaluation of aerial scene classification.IEEE Transactions on Geoscience and Remote Sensing, 2017. 4, 5, 7

work page 2017

[33] [33]

DOTA: A large-scale dataset for object detection in aerial images

Gui-Song Xia, Xiang Bai, Jian Ding, Zhen Zhu, Serge Be- longie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, and Liang- pei Zhang. DOTA: A large-scale dataset for object detection in aerial images. InCVPR, 2018. 4, 5

work page 2018

[34] [34]

Pixel-aware stable diffusion for realistic im- age super-resolution and personalized stylization

Tao Yang, Rongyuan Wu, Peiran Ren, Xuansong Xie, and Lei Zhang. Pixel-aware stable diffusion for realistic im- age super-resolution and personalized stylization. InECCV,

work page

[35] [35]

Scaling up to excellence: Practicing model scaling for photo- realistic image restoration in the wild

Fanghua Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, and Chao Dong. Scaling up to excellence: Practicing model scaling for photo- realistic image restoration in the wild. InCVPR, 2024. 2

work page 2024

[36] [36]

Arbitrary-steps image super-resolution via diffusion inver- sion

Zongsheng Yue, Kang Liao, and Chen Change Loy. Arbitrary-steps image super-resolution via diffusion inver- sion. InCVPR, 2025. 2

work page 2025

[37] [37]

Effi- cient diffusion model for image restoration by residual shift- ing.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Zongsheng Yue, Jianyi Wang, and Chen Change Loy. Effi- cient diffusion model for image restoration by residual shift- ing.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 2, 5

work page 2025

[38] [38]

SuperYOLO: Super resolution as- sisted object detection in multimodal remote sensing im- agery.IEEE Transactions on Geoscience and Remote Sens- ing, 2023

Jiaqing Zhang, Jie Lei, Weiying Xie, Zhenman Fang, Yun- song Li, and Qian Du. SuperYOLO: Super resolution as- sisted object detection in multimodal remote sensing im- agery.IEEE Transactions on Geoscience and Remote Sens- ing, 2023. 1

work page 2023

[39] [39]

Designing a practical degradation model for deep blind image super-resolution

Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timo- fte. Designing a practical degradation model for deep blind image super-resolution. InICCV, 2021. 2

work page 2021

[40] [40]

Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In ICCV, 2023. 2, 4

work page 2023

[41] [41]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InCVPR, 2018. 3, 5

work page 2018

[42] [42]

Dirichlet-derived multiple topic scene classification model for high spatial resolution remote sensing imagery.IEEE Transactions on Geoscience and Remote Sensing, 2015

Bei Zhao, Yanfei Zhong, Gui-Song Xia, and Liangpei Zhang. Dirichlet-derived multiple topic scene classification model for high spatial resolution remote sensing imagery.IEEE Transactions on Geoscience and Remote Sensing, 2015. 5

work page 2015

[43] [43]

Feature significance-based multibag-of-visual-words model for re- mote sensing image scene classification.Journal of Applied Remote Sensing, 2016

Lijun Zhao, Ping Tang, and Lianzhi Huo. Feature significance-based multibag-of-visual-words model for re- mote sensing image scene classification.Journal of Applied Remote Sensing, 2016. 4, 5 10

work page 2016