Quaternion Wavelet-Conditioned Diffusion Models for Image Super-Resolution

Aurelio Uncini; Christian Bianchi; Danilo Comminiello; Luigi Sigillo

arxiv: 2505.00334 · v3 · submitted 2025-05-01 · 💻 cs.CV · cs.LG

Quaternion Wavelet-Conditioned Diffusion Models for Image Super-Resolution

Luigi Sigillo , Christian Bianchi , Aurelio Uncini , Danilo Comminiello This is my paper

Pith reviewed 2026-05-22 17:27 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords image super-resolutiondiffusion modelsquaternion waveletslatent diffusionimage reconstructionperceptual qualitywavelet conditioning

0 comments

The pith

Quaternion wavelet embeddings dynamically integrated into latent diffusion models improve image super-resolution conditioning and perceptual quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ResQu, a framework that preprocesses low-resolution images with quaternion wavelets and feeds the resulting embeddings into a latent diffusion model through a custom quaternion wavelet- and time-aware encoder. This encoder supplies the embeddings at multiple stages of the denoising process rather than once at the start. The goal is to give the diffusion model richer structural and textural guidance so that high-resolution outputs preserve fine details and realistic textures even at large upscaling factors. A reader would care because super-resolution directly affects downstream tasks such as medical imaging, object detection, and satellite analysis where both visual realism and geometric accuracy matter. The work also draws on the generative priors already present in foundation models such as Stable Diffusion to further stabilize the reconstruction.

Core claim

The central claim is that a quaternion wavelet preprocessing stage combined with a quaternion wavelet- and time-aware encoder that injects embeddings dynamically throughout the denoising trajectory produces higher-fidelity super-resolved images than prior diffusion-based methods, as measured by both perceptual metrics and standard evaluation scores on domain-specific datasets.

What carries the argument

The quaternion wavelet- and time-aware encoder, which converts wavelet coefficients into quaternion-valued features and injects them at successive denoising timesteps to guide the latent diffusion process.

If this is right

Super-resolved images exhibit improved balance between fine texture recovery and geometric fidelity at high upscaling factors.
Leveraging pre-trained generative priors from models such as Stable Diffusion becomes more effective when paired with wavelet-based conditioning.
The same conditioning strategy can be applied across multiple domain-specific datasets while maintaining competitive performance on standard metrics.
Downstream vision tasks that rely on high-resolution inputs, such as detection and segmentation, receive higher-quality inputs from the super-resolution stage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The dynamic multi-stage injection pattern may transfer to other diffusion-based image-to-image tasks such as denoising or inpainting.
Quaternion representations could offer similar benefits when applied to multi-spectral or multi-channel data beyond standard RGB images.
If the encoder proves robust, it could reduce the need for task-specific fine-tuning when adapting foundation diffusion models to new resolution targets.

Load-bearing premise

Dynamically inserting quaternion wavelet embeddings at multiple stages of the denoising process will strengthen conditioning and raise perceptual quality without creating new artifacts or structural distortions.

What would settle it

A controlled experiment on a held-out domain-specific test set in which the ResQu outputs show either lower perceptual scores than the strongest baseline or visible geometric distortions that are absent in the baseline reconstructions.

Figures

Figures reproduced from arXiv: 2505.00334 by Aurelio Uncini, Christian Bianchi, Danilo Comminiello, Luigi Sigillo.

**Figure 2.** Figure 2: Overview of ResQu Super-Resolution framework. We pre-trained [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of the LR input image with SR outputs generated by state-of-the-art methods and our proposed model on the DRealSR [50] and RealSR [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Impact of the number of sampling steps on key evaluation metrics. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Visual comparison of diverse super-resolution results generated by [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

Image Super-Resolution is a fundamental problem in computer vision with broad applications spacing from medical imaging to satellite analysis. The ability to reconstruct high-resolution images from low-resolution inputs is crucial for enhancing downstream tasks such as object detection and segmentation. While deep learning has significantly advanced SR, achieving high-quality reconstructions with fine-grained details and realistic textures remains challenging, particularly at high upscaling factors. Recent approaches leveraging diffusion models have demonstrated promising results, yet they often struggle to balance perceptual quality with structural fidelity. In this work, we introduce ResQu a novel SR framework that integrates a quaternion wavelet preprocessing framework with latent diffusion models, incorporating a new quaternion wavelet- and time-aware encoder. Unlike prior methods that simply apply wavelet transforms within diffusion models, our approach enhances the conditioning process by exploiting quaternion wavelet embeddings, which are dynamically integrated at different stages of denoising. Furthermore, we also leverage the generative priors of foundation models such as Stable Diffusion. Extensive experiments on domain-specific datasets demonstrate that our method achieves outstanding SR results, outperforming in many cases existing approaches in perceptual quality and standard evaluation metrics. The code is available at https://www.github.com/Fascetta/ResQu

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces ResQu, a super-resolution framework that combines quaternion wavelet preprocessing with latent diffusion models. It proposes a quaternion wavelet- and time-aware encoder that dynamically integrates embeddings at different stages of the denoising process while leveraging generative priors from Stable Diffusion. The central claim is that this yields outstanding SR results on domain-specific datasets, outperforming prior methods in perceptual quality and standard metrics.

Significance. If the empirical claims are substantiated, the work could offer a practical advance in conditioning diffusion models for SR by exploiting quaternion representations to capture RGB correlations alongside multi-scale wavelet features. The dynamic, time-aware integration and use of foundation-model priors represent engineering strengths, and the public code release supports reproducibility.

major comments (2)

[Method (quaternion wavelet- and time-aware encoder)] The central claim rests on the quaternion wavelet- and time-aware encoder dynamically improving conditioning without introducing artifacts. However, the method description provides no analysis or derivation of how quaternion components modulate the latent noise schedule or U-Net cross-attention at different timesteps t, leaving the interaction with the diffusion process unexamined.
[Abstract and Experiments] The abstract asserts that extensive experiments demonstrate outperforming existing approaches, yet no quantitative metrics, ablation studies on integration stages or wavelet scales, or error analysis are referenced. This makes it impossible to verify whether reported gains are load-bearing or sensitive to post-hoc choices.

minor comments (1)

[Abstract] The abstract mentions 'domain-specific datasets' without naming them or providing details on upscaling factors tested.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and describe the revisions we will make to improve clarity and substantiation of our claims.

read point-by-point responses

Referee: [Method (quaternion wavelet- and time-aware encoder)] The central claim rests on the quaternion wavelet- and time-aware encoder dynamically improving conditioning without introducing artifacts. However, the method description provides no analysis or derivation of how quaternion components modulate the latent noise schedule or U-Net cross-attention at different timesteps t, leaving the interaction with the diffusion process unexamined.

Authors: We appreciate this observation regarding the need for deeper examination of the encoder's interaction with the diffusion process. The current manuscript details the architecture and dynamic integration of quaternion wavelet embeddings via the time-aware encoder at different denoising stages, leveraging Stable Diffusion priors. To strengthen this, we will add a dedicated analysis subsection in the revised version that includes both empirical visualizations of the modulation effects and, where feasible, a step-by-step derivation of how quaternion components influence cross-attention and the noise schedule across timesteps t. This addition will explicitly address potential artifact introduction and clarify the conditioning mechanism. revision: yes
Referee: [Abstract and Experiments] The abstract asserts that extensive experiments demonstrate outperforming existing approaches, yet no quantitative metrics, ablation studies on integration stages or wavelet scales, or error analysis are referenced. This makes it impossible to verify whether reported gains are load-bearing or sensitive to post-hoc choices.

Authors: We agree that the abstract would benefit from more explicit references to support the performance claims. Although the full manuscript presents quantitative results, ablation studies on integration stages and wavelet scales, and error analysis within the Experiments section on domain-specific datasets, we will revise the abstract to incorporate key metrics (e.g., PSNR, SSIM, LPIPS) and explicitly note the ablation findings. This will make the outperformance claims more verifiable without altering the high-level nature of the abstract. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical engineering contribution

full rationale

The paper introduces ResQu as a novel SR framework that combines quaternion wavelet preprocessing with latent diffusion models via a new encoder for dynamic embedding integration. No derivation chain, equations, or first-principles predictions are presented that reduce claimed improvements to inputs by construction, fitted parameters renamed as outputs, or self-citation load-bearing premises. Central claims rest on experimental results across domain-specific datasets rather than any self-referential mathematical reduction, rendering the work self-contained as an empirical proposal.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the effectiveness of quaternion wavelets for image feature representation and on the benefit of multi-stage dynamic conditioning inside diffusion models; these are treated as domain assumptions rather than derived results.

free parameters (1)

Choice of integration stages and wavelet scales
The abstract states embeddings are dynamically integrated at different stages; the specific selection of stages and scales is a design choice that affects the result.

axioms (1)

domain assumption Quaternion wavelet transforms capture image structure and color more effectively than real-valued wavelets for conditioning purposes.
Invoked by the choice of quaternion wavelet preprocessing framework.

pith-pipeline@v0.9.0 · 5738 in / 1232 out tokens · 34087 ms · 2026-05-22T17:27:23.480917+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we introduce a novel quaternion wavelet- and time-aware encoder that enhances the conditioning process at multiple scales during denoising... QUA VE embeddings together with the timestep embeddings are passed through the encoder and used to modulate the intermediate feature maps of the U-Net via SFT
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Quaternion Wavelet Transform (QWT) integrates four quaternion wavelet transforms... yielding a total of 16 real sub-bands

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages

[1]

Exploiting diffusion prior for real-world image super-resolution,

J. Wang, Z. Yue, S. Zhou, K. C. Chan, and C. C. Loy, “Exploiting diffusion prior for real-world image super-resolution,”International Journal of Computer Vision, pp. 1–21, 2024

work page 2024
[2]

Image super- resolution: A comprehensive review, recent trends, challenges and ap- plications,

D. C. Lepcha, B. Goyal, A. Dogra, and V . Goyal, “Image super- resolution: A comprehensive review, recent trends, challenges and ap- plications,”Information Fusion, vol. 91, pp. 230–260, 2023

work page 2023
[3]

Metadata, wavelet, and time aware diffusion models for satellite image super resolution,

L. Sigillo, R. Giamba, and D. Comminiello, “Metadata, wavelet, and time aware diffusion models for satellite image super resolution,” inICLR 2025 Workshop on Machine Learning for Remote Sensing (ML4RS), 2025. [Online]. Available: https://ml-for-rs.github.io/iclr2025/ camera ready/papers/19.pdf

work page 2025
[4]

Beyond image super-resolution for image recognition with task-driven perceptual loss,

J. Kim, J. Oh, and K. M. Lee, “Beyond image super-resolution for image recognition with task-driven perceptual loss,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 2651–2661

work page 2024
[5]

Wavelet pyramid recurrent structure- preserving attention network for single image super-resolution,

W.-Y . Hsu and P.-W. Jian, “Wavelet pyramid recurrent structure- preserving attention network for single image super-resolution,”IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 11, pp. 15 772–15 786, 2024

work page 2024
[6]

Dreambooth: Fine tuning text-to-image diffusion models for subject- driven generation,

N. Ruiz, Y . Li, V . Jampani, Y . Pritch, M. Rubinstein, and K. Aberman, “Dreambooth: Fine tuning text-to-image diffusion models for subject- driven generation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 22 500–22 510

work page 2023
[7]

Guess what i think: Streamlined EEG-to-image generation with latent diffusion models,

E. Lopez, L. Sigillo, F. Colonnese, M. Panella, and D. Comminiello, “Guess what i think: Streamlined EEG-to-image generation with latent diffusion models,” inICASSP 2025 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025, pp. 1–5

work page 2025
[8]

High-resolution image reconstruction with latent diffusion models from human brain activity,

Y . Takagi and S. Nishimoto, “High-resolution image reconstruction with latent diffusion models from human brain activity,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 14 453–14 463

work page 2023
[9]

Lavie: High-quality video generation with cascaded latent diffusion models,

Y . Wang, X. Chen, X. Ma, S. Zhou, Z. Huang, Y . Wang, C. Yang, Y . He, J. Yu, P. Yanget al., “Lavie: High-quality video generation with cascaded latent diffusion models,”International Journal of Computer Vision, pp. 1–20, 2024

work page 2024
[10]

Image super-resolution via iterative refinement,

C. Saharia, J. Ho, W. Chan, T. Salimans, D. J. Fleet, and M. Norouzi, “Image super-resolution via iterative refinement,”IEEE Trans. on Pat- tern Analysis and Machine Intelligence, vol. 45, no. 4, 2023

work page 2023
[11]

Resdiff: Combining cnn and diffusion model for image super-resolution,

S. Shang, Z. Shan, G. Liu, L. Wang, X. Wang, Z. Zhang, and J. Zhang, “Resdiff: Combining cnn and diffusion model for image super-resolution,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 8, 2024, pp. 8975–8983

work page 2024
[12]

Implicit diffusion models for continuous super-resolution,

S. Gao, X. Liu, B. Zeng, S. Xu, Y . Li, X. Luo, J. Liu, X. Zhen, and B. Zhang, “Implicit diffusion models for continuous super-resolution,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 10 021–10 030

work page 2023
[13]

Ship in sight: Diffusion models for ship-image super resolution,

L. Sigillo, R. F. Gramaccioni, A. Nicolosi, and D. Comminiello, “Ship in sight: Diffusion models for ship-image super resolution,” inInterna- tional Joint Conference on Neural Networks (IJCNN), 2024, pp. 1–8

work page 2024
[14]

Seesr: Towards semantics-aware real-world image super-resolution,

R. Wu, T. Yang, L. Sun, Z. Zhang, S. Li, and L. Zhang, “Seesr: Towards semantics-aware real-world image super-resolution,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 25 456–25 467

work page 2024
[15]

Wavedm: Wavelet-based diffusion models for image restoration,

Y . Huang, J. Huang, J. Liu, M. Yan, Y . Dong, J. Lyu, C. Chen, and S. Chen, “Wavedm: Wavelet-based diffusion models for image restoration,”IEEE Transactions on Multimedia, 2024

work page 2024
[16]

Wavelet diffusion models are fast and scalable image generators,

H. Phung, Q. Dao, and A. Tran, “Wavelet diffusion models are fast and scalable image generators,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 10 199–10 208

work page 2023
[17]

Aloisi, L

L. Aloisi, L. Sigillo, A. Uncini, and D. Comminiello,A Wavelet Diffusion GAN for Image Super-Resolution. Singapore: Springer Nature Singapore, 2026, pp. 425–435. [Online]. Available: https: //doi.org/10.1007/978-981-95-4072-3 36

work page doi:10.1007/978-981-95-4072-3 2026
[18]

Waving goodbye to low-res: A diffusion-wavelet approach for image super- resolution,

B. B. Moser, S. Frolov, F. Raue, S. Palacio, and A. Dengel, “Waving goodbye to low-res: A diffusion-wavelet approach for image super- resolution,” in2024 International Joint Conference on Neural Networks (IJCNN), 2024, pp. 1–8

work page 2024
[19]

Generalizing medical image representations via quaternion wavelet networks,

L. Sigillo, E. Grassucci, A. Uncini, and D. Comminiello, “Generalizing medical image representations via quaternion wavelet networks,” Neurocomputing, vol. 638, p. 130195, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231225008677

work page 2025
[20]

High- resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,”IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2021

work page 2021
[21]

Second-order attention network for single image super-resolution,

T. Dai, J. Cai, Y . Zhang, S.-T. Xia, and L. Zhang, “Second-order attention network for single image super-resolution,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 11 065–11 074

work page 2019
[22]

Learning a deep convolutional network for image super-resolution,

C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” inComputer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV 13. Springer, 2014, pp. 184–199

work page 2014
[23]

Image super-resolution using deep convolutional networks,

——, “Image super-resolution using deep convolutional networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 2, pp. 295–307, 2015

work page 2015
[24]

Accelerating the super-resolution convolutional neural network,

C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” inComputer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer, 2016, pp. 391–407

work page 2016
[25]

Photo-realistic single image super- resolution using a generative adversarial network,

C. Ledig, L. Theis, F. Husz ´ar, J. Caballero, A. P. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic single image super- resolution using a generative adversarial network,”IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 105–114, 2016

work page 2016
[26]

ESRGAN: Enhanced super-resolution generative ad- versarial networks,

X. Wang, K. Yu, S. Wu, J. Gu, Y . Liu, C. Dong, Y . Qiao, and C. Change Loy, “ESRGAN: Enhanced super-resolution generative ad- versarial networks,” inProceedings of the European Conf. on computer vision (ECCV) workshops, 2018, pp. 0–0

work page 2018
[27]

Image super-resolution using very deep residual channel attention networks,

Y . Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y . R. Fu, “Image super-resolution using very deep residual channel attention networks,” inEuropean Conf. on Computer Vision, 2018

work page 2018
[28]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” inInt. Conf. on Learning Representations, 2021

work page 2021
[29]

SwinIR: Image restoration using swin transformer,

J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “SwinIR: Image restoration using swin transformer,” in2021 IEEE/CVF Int. Conf. on Computer Vision Workshops (ICCVW), 2021

work page 2021
[30]

Pulse: Self- supervised photo upsampling via latent space exploration of generative models,

S. Menon, A. Damian, S. Hu, N. Ravi, and C. Rudin, “Pulse: Self- supervised photo upsampling via latent space exploration of generative models,”IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2434–2442, 2020

work page 2020
[31]

Designing a practical degradation model for deep blind image super-resolution,

K. Zhang, J. Liang, L. Van Gool, and R. Timofte, “Designing a practical degradation model for deep blind image super-resolution,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 4791–4800

work page 2021
[32]

Real-ESRGAN: Training real- world blind super-resolution with pure synthetic data,

X. Wang, L. Xie, C. Dong, and Y . Shan, “Real-ESRGAN: Training real- world blind super-resolution with pure synthetic data,” inIEEE/CVF Int. Conf. on Computer Vision Workshops (ICCVW), 2021

work page 2021
[33]

Denoising diffusion implicit models,

J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” inInt. Conf. on Learning Representations, 2021

work page 2021
[34]

Large scale GAN training for high fidelity natural image synthesis,

A. Brock, J. Donahue, and K. Simonyan, “Large scale GAN training for high fidelity natural image synthesis,” inInt. Conf. on Learning Representations, 2019

work page 2019
[35]

Diffbir: Toward blind image restoration with generative diffusion prior,

X. Lin, J. He, Z. Chen, Z. Lyu, B. Dai, F. Yu, Y . Qiao, W. Ouyang, and C. Dong, “Diffbir: Toward blind image restoration with generative diffusion prior,” inEuropean Conference on Computer Vision. Springer, 2025, pp. 430–448

work page 2025
[36]

L. A. Barford, R. S. Fazzio, and D. R. Smith,An introduction to wavelets. Hewlett Packard, 1992

work page 1992
[37]

Wavelets and filter banks: Theory and design,

M. Vetterli and C. Herley, “Wavelets and filter banks: Theory and design,”IEEE transactions on signal processing, 1992

work page 1992
[38]

Quaternion wavelets for image analysis and processing,

W. L. Chan, H. Choi, and R. Baraniuk, “Quaternion wavelets for image analysis and processing,” in2004 International Conference on Image Processing, 2004. ICIP’04., vol. 5. IEEE, 2004, pp. 3057–3060

work page 2004
[39]

The dual-tree complex wavelet transform,

I. W. Selesnick, R. G. Baraniuk, and N. C. Kingsbury, “The dual-tree complex wavelet transform,”IEEE signal processing magazine, vol. 22, no. 6, pp. 123–151, 2005

work page 2005
[41]

Medical image fusion based on quaternion wavelet transform,

Z. Zhancheng, L. Xiaoqing, X. Mengyu, W. Zhiwen, and L. Kai, “Medical image fusion based on quaternion wavelet transform,”Journal of Algorithms & Computational Technology, vol. 14, 2020

work page 2020
[42]

Coherent multiscale image processing using dual-tree quaternion wavelets,

W. L. Chan, H. Choi, and R. G. Baraniuk, “Coherent multiscale image processing using dual-tree quaternion wavelets,”IEEE Transactions on Image Processing, vol. 17, no. 7, pp. 1069–1082, 2008

work page 2008
[43]

GROUSE: A task and model agnostic wavelet- driven framework for medical imaging,

E. Grassucci, L. Sigillo, A. Uncini, and D. Comminiello, “GROUSE: A task and model agnostic wavelet- driven framework for medical imaging,”IEEE Signal Processing Letters, vol. 30, pp. 1397–1401, 2023

work page 2023
[44]

Quaternion wavelets for image analysis and processing,

W. L. Chan, H. Choi, and R. Baraniuk, “Quaternion wavelets for image analysis and processing,” in2004 International Conference on Image Processing, 2004. ICIP ’04., vol. 5, 2004, pp. 3057–3060 V ol. 5

work page 2004
[45]

Recovering realistic texture in image super-resolution by deep spatial feature transform,

X. Wang, K. Yu, C. Dong, and C. Change Loy, “Recovering realistic texture in image super-resolution by deep spatial feature transform,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2018

work page 2018
[46]

Perception pri- oritized training of diffusion models,

J. Choi, J. Lee, C. Shin, S. Kim, H. Kim, and S. Yoon, “Perception pri- oritized training of diffusion models,” inIEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2022

work page 2022
[47]

Real-world super- resolution via kernel estimation and noise injection,

X. Ji, Y . Cao, Y . Tai, C. Wang, J. Li, and F. Huang, “Real-world super- resolution via kernel estimation and noise injection,” inproceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 466–467

work page 2020
[48]

Real-world blind super-resolution via feature matching with implicit high-resolution priors,

C. Chen, X. Shi, Y . Qin, X. Li, X. Han, T. Yang, and S. Guo, “Real-world blind super-resolution via feature matching with implicit high-resolution priors,” inProceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 1329–1338

work page 2022
[49]

Resshift: Efficient diffusion model for image super-resolution by residual shifting,

Z. Yue, J. Wang, and C. C. Loy, “Resshift: Efficient diffusion model for image super-resolution by residual shifting,”Advances in Neural Information Processing Systems, vol. 36, 2024

work page 2024
[50]

Component divide-and-conquer for real-world image super-resolution,

P. Wei, Z. Xie, H. Lu, Z. Zhan, Q. Ye, W. Zuo, and L. Lin, “Component divide-and-conquer for real-world image super-resolution,” inComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16. Springer, 2020, pp. 101–117

work page 2020
[51]

Toward real-world single image super-resolution: A new benchmark and a new model,

J. Cai, H. Zeng, H. Yong, Z. Cao, and L. Zhang, “Toward real-world single image super-resolution: A new benchmark and a new model,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 3086–3095

work page 2019
[52]

Ntire challenge on single image super- resolution: Dataset and study,

E. Agustsson and R. Timofte, “Ntire challenge on single image super- resolution: Dataset and study,” inProceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017

work page 2017
[53]

Ntire challenge on single image super-resolution: Methods and results,

R. Timofte, E. Agustsson, L. Van Gool, M.-H. Yang, and L. Zhang, “Ntire challenge on single image super-resolution: Methods and results,” inProceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017

work page 2017
[54]

Recovering realistic texture in image super-resolution by deep spatial feature transform,

X. Wang, K. Yu, C. Dong, and C. C. Loy, “Recovering realistic texture in image super-resolution by deep spatial feature transform,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 606–615

work page 2018

[1] [1]

Exploiting diffusion prior for real-world image super-resolution,

J. Wang, Z. Yue, S. Zhou, K. C. Chan, and C. C. Loy, “Exploiting diffusion prior for real-world image super-resolution,”International Journal of Computer Vision, pp. 1–21, 2024

work page 2024

[2] [2]

Image super- resolution: A comprehensive review, recent trends, challenges and ap- plications,

D. C. Lepcha, B. Goyal, A. Dogra, and V . Goyal, “Image super- resolution: A comprehensive review, recent trends, challenges and ap- plications,”Information Fusion, vol. 91, pp. 230–260, 2023

work page 2023

[3] [3]

Metadata, wavelet, and time aware diffusion models for satellite image super resolution,

L. Sigillo, R. Giamba, and D. Comminiello, “Metadata, wavelet, and time aware diffusion models for satellite image super resolution,” inICLR 2025 Workshop on Machine Learning for Remote Sensing (ML4RS), 2025. [Online]. Available: https://ml-for-rs.github.io/iclr2025/ camera ready/papers/19.pdf

work page 2025

[4] [4]

Beyond image super-resolution for image recognition with task-driven perceptual loss,

J. Kim, J. Oh, and K. M. Lee, “Beyond image super-resolution for image recognition with task-driven perceptual loss,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 2651–2661

work page 2024

[5] [5]

Wavelet pyramid recurrent structure- preserving attention network for single image super-resolution,

W.-Y . Hsu and P.-W. Jian, “Wavelet pyramid recurrent structure- preserving attention network for single image super-resolution,”IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 11, pp. 15 772–15 786, 2024

work page 2024

[6] [6]

Dreambooth: Fine tuning text-to-image diffusion models for subject- driven generation,

N. Ruiz, Y . Li, V . Jampani, Y . Pritch, M. Rubinstein, and K. Aberman, “Dreambooth: Fine tuning text-to-image diffusion models for subject- driven generation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 22 500–22 510

work page 2023

[7] [7]

Guess what i think: Streamlined EEG-to-image generation with latent diffusion models,

E. Lopez, L. Sigillo, F. Colonnese, M. Panella, and D. Comminiello, “Guess what i think: Streamlined EEG-to-image generation with latent diffusion models,” inICASSP 2025 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025, pp. 1–5

work page 2025

[8] [8]

High-resolution image reconstruction with latent diffusion models from human brain activity,

Y . Takagi and S. Nishimoto, “High-resolution image reconstruction with latent diffusion models from human brain activity,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 14 453–14 463

work page 2023

[9] [9]

Lavie: High-quality video generation with cascaded latent diffusion models,

Y . Wang, X. Chen, X. Ma, S. Zhou, Z. Huang, Y . Wang, C. Yang, Y . He, J. Yu, P. Yanget al., “Lavie: High-quality video generation with cascaded latent diffusion models,”International Journal of Computer Vision, pp. 1–20, 2024

work page 2024

[10] [10]

Image super-resolution via iterative refinement,

C. Saharia, J. Ho, W. Chan, T. Salimans, D. J. Fleet, and M. Norouzi, “Image super-resolution via iterative refinement,”IEEE Trans. on Pat- tern Analysis and Machine Intelligence, vol. 45, no. 4, 2023

work page 2023

[11] [11]

Resdiff: Combining cnn and diffusion model for image super-resolution,

S. Shang, Z. Shan, G. Liu, L. Wang, X. Wang, Z. Zhang, and J. Zhang, “Resdiff: Combining cnn and diffusion model for image super-resolution,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 8, 2024, pp. 8975–8983

work page 2024

[12] [12]

Implicit diffusion models for continuous super-resolution,

S. Gao, X. Liu, B. Zeng, S. Xu, Y . Li, X. Luo, J. Liu, X. Zhen, and B. Zhang, “Implicit diffusion models for continuous super-resolution,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 10 021–10 030

work page 2023

[13] [13]

Ship in sight: Diffusion models for ship-image super resolution,

L. Sigillo, R. F. Gramaccioni, A. Nicolosi, and D. Comminiello, “Ship in sight: Diffusion models for ship-image super resolution,” inInterna- tional Joint Conference on Neural Networks (IJCNN), 2024, pp. 1–8

work page 2024

[14] [14]

Seesr: Towards semantics-aware real-world image super-resolution,

R. Wu, T. Yang, L. Sun, Z. Zhang, S. Li, and L. Zhang, “Seesr: Towards semantics-aware real-world image super-resolution,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 25 456–25 467

work page 2024

[15] [15]

Wavedm: Wavelet-based diffusion models for image restoration,

Y . Huang, J. Huang, J. Liu, M. Yan, Y . Dong, J. Lyu, C. Chen, and S. Chen, “Wavedm: Wavelet-based diffusion models for image restoration,”IEEE Transactions on Multimedia, 2024

work page 2024

[16] [16]

Wavelet diffusion models are fast and scalable image generators,

H. Phung, Q. Dao, and A. Tran, “Wavelet diffusion models are fast and scalable image generators,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 10 199–10 208

work page 2023

[17] [17]

Aloisi, L

L. Aloisi, L. Sigillo, A. Uncini, and D. Comminiello,A Wavelet Diffusion GAN for Image Super-Resolution. Singapore: Springer Nature Singapore, 2026, pp. 425–435. [Online]. Available: https: //doi.org/10.1007/978-981-95-4072-3 36

work page doi:10.1007/978-981-95-4072-3 2026

[18] [18]

Waving goodbye to low-res: A diffusion-wavelet approach for image super- resolution,

B. B. Moser, S. Frolov, F. Raue, S. Palacio, and A. Dengel, “Waving goodbye to low-res: A diffusion-wavelet approach for image super- resolution,” in2024 International Joint Conference on Neural Networks (IJCNN), 2024, pp. 1–8

work page 2024

[19] [19]

Generalizing medical image representations via quaternion wavelet networks,

L. Sigillo, E. Grassucci, A. Uncini, and D. Comminiello, “Generalizing medical image representations via quaternion wavelet networks,” Neurocomputing, vol. 638, p. 130195, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231225008677

work page 2025

[20] [20]

High- resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,”IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2021

work page 2021

[21] [21]

Second-order attention network for single image super-resolution,

T. Dai, J. Cai, Y . Zhang, S.-T. Xia, and L. Zhang, “Second-order attention network for single image super-resolution,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 11 065–11 074

work page 2019

[22] [22]

Learning a deep convolutional network for image super-resolution,

C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” inComputer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV 13. Springer, 2014, pp. 184–199

work page 2014

[23] [23]

Image super-resolution using deep convolutional networks,

——, “Image super-resolution using deep convolutional networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 2, pp. 295–307, 2015

work page 2015

[24] [24]

Accelerating the super-resolution convolutional neural network,

C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” inComputer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer, 2016, pp. 391–407

work page 2016

[25] [25]

Photo-realistic single image super- resolution using a generative adversarial network,

C. Ledig, L. Theis, F. Husz ´ar, J. Caballero, A. P. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic single image super- resolution using a generative adversarial network,”IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 105–114, 2016

work page 2016

[26] [26]

ESRGAN: Enhanced super-resolution generative ad- versarial networks,

X. Wang, K. Yu, S. Wu, J. Gu, Y . Liu, C. Dong, Y . Qiao, and C. Change Loy, “ESRGAN: Enhanced super-resolution generative ad- versarial networks,” inProceedings of the European Conf. on computer vision (ECCV) workshops, 2018, pp. 0–0

work page 2018

[27] [27]

Image super-resolution using very deep residual channel attention networks,

Y . Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y . R. Fu, “Image super-resolution using very deep residual channel attention networks,” inEuropean Conf. on Computer Vision, 2018

work page 2018

[28] [28]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” inInt. Conf. on Learning Representations, 2021

work page 2021

[29] [29]

SwinIR: Image restoration using swin transformer,

J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “SwinIR: Image restoration using swin transformer,” in2021 IEEE/CVF Int. Conf. on Computer Vision Workshops (ICCVW), 2021

work page 2021

[30] [30]

Pulse: Self- supervised photo upsampling via latent space exploration of generative models,

S. Menon, A. Damian, S. Hu, N. Ravi, and C. Rudin, “Pulse: Self- supervised photo upsampling via latent space exploration of generative models,”IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2434–2442, 2020

work page 2020

[31] [31]

Designing a practical degradation model for deep blind image super-resolution,

K. Zhang, J. Liang, L. Van Gool, and R. Timofte, “Designing a practical degradation model for deep blind image super-resolution,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 4791–4800

work page 2021

[32] [32]

Real-ESRGAN: Training real- world blind super-resolution with pure synthetic data,

X. Wang, L. Xie, C. Dong, and Y . Shan, “Real-ESRGAN: Training real- world blind super-resolution with pure synthetic data,” inIEEE/CVF Int. Conf. on Computer Vision Workshops (ICCVW), 2021

work page 2021

[33] [33]

Denoising diffusion implicit models,

J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” inInt. Conf. on Learning Representations, 2021

work page 2021

[34] [34]

Large scale GAN training for high fidelity natural image synthesis,

A. Brock, J. Donahue, and K. Simonyan, “Large scale GAN training for high fidelity natural image synthesis,” inInt. Conf. on Learning Representations, 2019

work page 2019

[35] [35]

Diffbir: Toward blind image restoration with generative diffusion prior,

X. Lin, J. He, Z. Chen, Z. Lyu, B. Dai, F. Yu, Y . Qiao, W. Ouyang, and C. Dong, “Diffbir: Toward blind image restoration with generative diffusion prior,” inEuropean Conference on Computer Vision. Springer, 2025, pp. 430–448

work page 2025

[36] [36]

L. A. Barford, R. S. Fazzio, and D. R. Smith,An introduction to wavelets. Hewlett Packard, 1992

work page 1992

[37] [37]

Wavelets and filter banks: Theory and design,

M. Vetterli and C. Herley, “Wavelets and filter banks: Theory and design,”IEEE transactions on signal processing, 1992

work page 1992

[38] [38]

Quaternion wavelets for image analysis and processing,

W. L. Chan, H. Choi, and R. Baraniuk, “Quaternion wavelets for image analysis and processing,” in2004 International Conference on Image Processing, 2004. ICIP’04., vol. 5. IEEE, 2004, pp. 3057–3060

work page 2004

[39] [39]

The dual-tree complex wavelet transform,

I. W. Selesnick, R. G. Baraniuk, and N. C. Kingsbury, “The dual-tree complex wavelet transform,”IEEE signal processing magazine, vol. 22, no. 6, pp. 123–151, 2005

work page 2005

[40] [41]

Medical image fusion based on quaternion wavelet transform,

Z. Zhancheng, L. Xiaoqing, X. Mengyu, W. Zhiwen, and L. Kai, “Medical image fusion based on quaternion wavelet transform,”Journal of Algorithms & Computational Technology, vol. 14, 2020

work page 2020

[41] [42]

Coherent multiscale image processing using dual-tree quaternion wavelets,

W. L. Chan, H. Choi, and R. G. Baraniuk, “Coherent multiscale image processing using dual-tree quaternion wavelets,”IEEE Transactions on Image Processing, vol. 17, no. 7, pp. 1069–1082, 2008

work page 2008

[42] [43]

GROUSE: A task and model agnostic wavelet- driven framework for medical imaging,

E. Grassucci, L. Sigillo, A. Uncini, and D. Comminiello, “GROUSE: A task and model agnostic wavelet- driven framework for medical imaging,”IEEE Signal Processing Letters, vol. 30, pp. 1397–1401, 2023

work page 2023

[43] [44]

Quaternion wavelets for image analysis and processing,

W. L. Chan, H. Choi, and R. Baraniuk, “Quaternion wavelets for image analysis and processing,” in2004 International Conference on Image Processing, 2004. ICIP ’04., vol. 5, 2004, pp. 3057–3060 V ol. 5

work page 2004

[44] [45]

Recovering realistic texture in image super-resolution by deep spatial feature transform,

X. Wang, K. Yu, C. Dong, and C. Change Loy, “Recovering realistic texture in image super-resolution by deep spatial feature transform,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2018

work page 2018

[45] [46]

Perception pri- oritized training of diffusion models,

J. Choi, J. Lee, C. Shin, S. Kim, H. Kim, and S. Yoon, “Perception pri- oritized training of diffusion models,” inIEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2022

work page 2022

[46] [47]

Real-world super- resolution via kernel estimation and noise injection,

X. Ji, Y . Cao, Y . Tai, C. Wang, J. Li, and F. Huang, “Real-world super- resolution via kernel estimation and noise injection,” inproceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 466–467

work page 2020

[47] [48]

Real-world blind super-resolution via feature matching with implicit high-resolution priors,

C. Chen, X. Shi, Y . Qin, X. Li, X. Han, T. Yang, and S. Guo, “Real-world blind super-resolution via feature matching with implicit high-resolution priors,” inProceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 1329–1338

work page 2022

[48] [49]

Resshift: Efficient diffusion model for image super-resolution by residual shifting,

Z. Yue, J. Wang, and C. C. Loy, “Resshift: Efficient diffusion model for image super-resolution by residual shifting,”Advances in Neural Information Processing Systems, vol. 36, 2024

work page 2024

[49] [50]

Component divide-and-conquer for real-world image super-resolution,

P. Wei, Z. Xie, H. Lu, Z. Zhan, Q. Ye, W. Zuo, and L. Lin, “Component divide-and-conquer for real-world image super-resolution,” inComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16. Springer, 2020, pp. 101–117

work page 2020

[50] [51]

Toward real-world single image super-resolution: A new benchmark and a new model,

J. Cai, H. Zeng, H. Yong, Z. Cao, and L. Zhang, “Toward real-world single image super-resolution: A new benchmark and a new model,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 3086–3095

work page 2019

[51] [52]

Ntire challenge on single image super- resolution: Dataset and study,

E. Agustsson and R. Timofte, “Ntire challenge on single image super- resolution: Dataset and study,” inProceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017

work page 2017

[52] [53]

Ntire challenge on single image super-resolution: Methods and results,

R. Timofte, E. Agustsson, L. Van Gool, M.-H. Yang, and L. Zhang, “Ntire challenge on single image super-resolution: Methods and results,” inProceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017

work page 2017

[53] [54]

Recovering realistic texture in image super-resolution by deep spatial feature transform,

X. Wang, K. Yu, C. Dong, and C. C. Loy, “Recovering realistic texture in image super-resolution by deep spatial feature transform,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 606–615

work page 2018