FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution
Pith reviewed 2026-05-17 03:20 UTC · model grok-4.3
The pith
FRAMER aligns low- and high-frequency features via self-distillation to improve detail recovery in diffusion-based real-image super-resolution.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FRAMER is a plug-and-play training scheme in which, at each denoising step, the final-layer feature map teaches every intermediate layer. Teacher and student feature maps are decomposed into low-frequency and high-frequency bands via FFT masks so supervision respects the model's internal frequency hierarchy. An Intra Contrastive Loss stabilizes globally shared low-frequency structure while an Inter Contrastive Loss sharpens instance-specific high-frequency details using random-layer and in-batch negatives. Two adaptive modulators, Frequency-based Adaptive Weight and Frequency-based Alignment Modulation, reweight per-layer signals and gate distillation according to current similarity, thereby
What carries the argument
Frequency-aligned self-distillation that decomposes features into LF/HF bands with FFT masks, applies IntraCL and InterCL contrastive losses, and modulates supervision with FAW and FAM.
If this is right
- Consistent gains appear in both reconstruction metrics (PSNR/SSIM) and perceptual metrics (LPIPS, NIQE, MANIQA, MUSIQ).
- The scheme works without any change to the diffusion backbone or to inference speed.
- Results hold across U-Net and DiT architectures including Stable Diffusion 2 and 3.
- Ablations confirm that the final layer as teacher and random-layer negatives are important contributors.
Where Pith is reading between the lines
- The same FFT-based band decomposition and adaptive contrastive supervision could be tested on other generative tasks that exhibit frequency bias, such as image inpainting or text-to-image synthesis.
- Because the method leaves the trained model unchanged at inference, it could be combined with existing acceleration techniques for diffusion sampling.
- Extending the modulators to condition on degradation type might further improve robustness when degradation statistics vary strongly across images.
Load-bearing premise
The final-layer feature map serves as an effective teacher for intermediate layers once features are decomposed into low- and high-frequency bands via FFT masks and this decomposition matches the model's internal low-first high-later hierarchy.
What would settle it
Train identical diffusion backbones on the same real-world super-resolution data with the FFT decomposition or the final-layer teacher removed; if PSNR, SSIM, and perceptual metrics show no gain or a drop, the central claim is falsified.
Figures
read the original abstract
Real-image super-resolution (Real-ISR) seeks to recover HR images from LR inputs with mixed, unknown degradations. While diffusion models surpass GANs in perceptual quality, they under-reconstruct high-frequency (HF) details due to a low-frequency (LF) bias and a depth-wise "low-first, high-later" hierarchy. We introduce FRAMER, a plug-and-play training scheme that exploits diffusion priors without changing the backbone or inference. At each denoising step, the final-layer feature map teaches all intermediate layers. Teacher and student feature maps are decomposed into LF/HF bands via FFT masks to align supervision with the model's internal frequency hierarchy. For LF, an Intra Contrastive Loss (IntraCL) stabilizes globally shared structure. For HF, an Inter Contrastive Loss (InterCL) sharpens instance-specific details using random-layer and in-batch negatives. Two adaptive modulators, Frequency-based Adaptive Weight (FAW) and Frequency-based Alignment Modulation (FAM), reweight per-layer LF/HF signals and gate distillation by current similarity. Across U-Net and DiT backbones (e.g., Stable Diffusion 2, 3), FRAMER consistently improves PSNR/SSIM and perceptual metrics (LPIPS, NIQE, MANIQA, MUSIQ). Ablations validate the final-layer teacher and random-layer negatives.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes FRAMER, a plug-and-play self-distillation training scheme for real-world image super-resolution that leverages diffusion priors. At each denoising step, final-layer feature maps teach intermediate layers after FFT-based decomposition into low-frequency (LF) and high-frequency (HF) bands. IntraCL stabilizes shared LF structure while InterCL sharpens instance-specific HF details using random-layer negatives; FAW and FAM modulators adaptively reweight and gate the signals. The method is evaluated on U-Net and DiT backbones (Stable Diffusion 2/3) and reports consistent gains in PSNR/SSIM plus perceptual metrics (LPIPS, NIQE, MANIQA, MUSIQ), with ablations supporting the final-layer teacher and random negatives.
Significance. If the quantitative claims hold, FRAMER provides an architecture- and inference-preserving way to mitigate the low-frequency bias and depth-wise hierarchy in diffusion models for restoration. The frequency-aligned contrastive formulation and adaptive modulators are a concrete contribution that could be adopted in other generative restoration pipelines; the plug-and-play nature and reported cross-backbone consistency are strengths.
major comments (2)
- [§3.2] §3.2 and Eq. (3)–(5): the claim that FFT-mask decomposition aligns supervision with the model's internal 'low-first, high-later' hierarchy rests on the unverified assumption that final-layer features are an effective teacher once separated into LF/HF bands; no layer-wise frequency-content analysis or correlation study is provided to substantiate this alignment.
- [Table 1] Table 1 (main results): reported PSNR/SSIM and perceptual-metric gains are presented without error bars, standard deviations across seeds, or statistical significance tests; this weakens the 'consistently improves' claim across U-Net and DiT backbones.
minor comments (2)
- [§4.3] §4.3: the ablation tables would benefit from explicit listing of all hyper-parameters (temperature, negative count, modulator thresholds) to enable reproduction.
- Notation: the distinction between IntraCL and InterCL is clear in text but the precise negative-sampling procedure for InterCL could be summarized in a single equation or algorithm box.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work and the recommendation for minor revision. The feedback on the motivation for frequency-aligned supervision and the presentation of quantitative results is valuable. We address each major comment below and commit to incorporating the suggested improvements in the revised manuscript.
read point-by-point responses
-
Referee: [§3.2] §3.2 and Eq. (3)–(5): the claim that FFT-mask decomposition aligns supervision with the model's internal 'low-first, high-later' hierarchy rests on the unverified assumption that final-layer features are an effective teacher once separated into LF/HF bands; no layer-wise frequency-content analysis or correlation study is provided to substantiate this alignment.
Authors: We appreciate the referee's observation. The choice of the final layer as teacher after FFT-based LF/HF decomposition is grounded in the established low-frequency bias and depth-wise hierarchy of diffusion models, as noted in the manuscript introduction and related work. Our ablation studies already demonstrate that the final-layer teacher outperforms intermediate-layer alternatives when paired with the frequency decomposition and contrastive losses. To provide direct empirical support for the alignment assumption, we will add a layer-wise frequency-content analysis (quantifying LF/HF energy ratios across layers) to the revised §3.2 and supplementary material. revision: yes
-
Referee: [Table 1] Table 1 (main results): reported PSNR/SSIM and perceptual-metric gains are presented without error bars, standard deviations across seeds, or statistical significance tests; this weakens the 'consistently improves' claim across U-Net and DiT backbones.
Authors: We agree that error bars and statistical tests would strengthen the presentation of the quantitative results. In the revised manuscript we will report standard deviations over multiple random seeds for all entries in Table 1 and include paired statistical significance tests (e.g., t-tests) for the observed improvements. The existing results already show consistent gains across two architecturally distinct backbones (U-Net and DiT) and multiple complementary metrics, which we view as supporting evidence of robustness; the additional statistics will further reinforce this claim. revision: yes
Circularity Check
No significant circularity detected in derivation chain
full rationale
The paper presents FRAMER as an empirical plug-and-play self-distillation training procedure that decomposes features into LF/HF bands using FFT masks, applies IntraCL for shared structure and InterCL for instance-specific details with random negatives, and employs FAW/FAM modulators for reweighting and gating. This is applied at each denoising step with the final-layer map as teacher for intermediate layers, without any shown equations that reduce by construction to fitted parameters, self-definitions, or renamed known results. No load-bearing self-citations, uniqueness theorems from prior author work, or ansatzes smuggled via citation are described; ablations are cited to validate components independently. The claimed metric gains across U-Net and DiT backbones follow directly from the introduced scheme rather than circular re-expression of inputs, making the derivation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Teacher and student feature maps are decomposed into LF/HF bands via FFT masks... Intra Contrastive Loss (IntraCL) for LF... Inter Contrastive Loss (InterCL) for HF... FAW and FAM modulators
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Ntire 2017 challenge on single image super-resolution: Dataset and study
Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. 2017. 2, 6
work page 2017
-
[2]
Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Xiaotian Han, Zhengyu Chen, Quanzeng You, and Hongxia Yang. Dream- clear: High-capacity real-world image restoration with privacy-safe dataset curation.Advances in Neural Informa- tion Processing Systems, 37:55443–55469, 2024. 6, 7, 14
work page 2024
-
[3]
Boosting latent diffusion with perceptual objectives
Tariq Berrada, Pietro Astolfi, Melissa Hall, Marton Havasi, Yohann Benchetrit, Adriana Romero-Soriano, Karteek Ala- hari, Michal Drozdzal, and Jakob Verbeek. Boosting latent diffusion with perceptual objectives. InThe Thirteenth In- ternational Conference on Learning Representations, 2025. 3
work page 2025
-
[4]
Toward real-world single image super-resolution: A new benchmark and a new model
Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, and Lei Zhang. Toward real-world single image super-resolution: A new benchmark and a new model. 2019. 6
work page 2019
-
[5]
Sssd: Self-supervised self distillation
Wei-Chi Chen and Wei-Ta Chu. Sssd: Self-supervised self distillation. In2023 IEEE/CVF Winter Conference on Ap- plications of Computer Vision (WACV), pages 2769–2776,
-
[6]
Effective diffusion transformer architecture for image super- resolution
Kun Cheng, Lei Yu, Zhijun Tu, Xiao He, Liyu Chen, Yong Guo, Mingrui Zhu, Nannan Wang, Xinbo Gao, and Jie Hu. Effective diffusion transformer architecture for image super- resolution. InProceedings of the AAAI Conference on Arti- ficial Intelligence, pages 2455–2463, 2025. 3
work page 2025
-
[7]
Perception pri- oritized training of diffusion models
Jooyoung Choi, Jungbeom Lee, Chaehun Shin, Sungwon Kim, Hyunwoo Kim, and Sungroh Yoon. Perception pri- oritized training of diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11472–11481, 2022. 2, 3, 6
work page 2022
-
[8]
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.Advances in neural informa- tion processing systems, 34:8780–8794, 2021. 2, 3
work page 2021
-
[9]
Learning a deep convolutional network for image super-resolution
Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Learning a deep convolutional network for image super-resolution. InEuropean conference on computer vi- sion, pages 184–199. Springer, 2014. 2
work page 2014
-
[10]
Dit4sr: Taming diffusion transformer for real-world image super-resolution
Zheng-Peng Duan, Jiawei Zhang, Xin Jin, Ziheng Zhang, Zheng Xiong, Dongqing Zou, Jimmy Ren, Chun-Le Guo, and Chongyi Li. Dit4sr: Taming diffusion transformer for real-world image super-resolution. InICCV 2025 Poster,
work page 2025
-
[11]
Exhibit Hall I #1755, Poster ID 534, Oct 22, 5:45–7:45 p.m. PDT. 1, 3, 6, 7, 14
-
[12]
Scaling recti- fied flow transformers for high-resolution image synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M ¨uller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling recti- fied flow transformers for high-resolution image synthesis. InForty-first international conference on machine learning,
-
[13]
A fourier space perspective on diffusion models, 2025
Fabian Falck, Teodora Pandeva, Kiarash Zahirnia, Rachel Lawrence, Richard Turner, Edward Meeds, Javier Zazo, and Sushrut Karmalkar. A fourier space perspective on diffusion models.arXiv preprint arXiv:2505.11278, 2025. 2, 3, 6
-
[14]
Garas Gendy, Guanghui He, and Nabil Sabor. Diffusion models for image super-resolution: State-of-the-art and fu- ture directions.Neurocomput., 617(C), 2025. 2
work page 2025
-
[15]
Div8k: Diverse 8k resolution image dataset
Shuhang Gu, Andreas Lugmayr, Martin Danelljan, Manuel Fritsche, Julien Lamour, and Radu Timofte. Div8k: Diverse 8k resolution image dataset. 2019. 6
work page 2019
-
[16]
Distilling the Knowledge in a Neural Network
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distill- ing the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015. 2, 3
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[17]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 2, 3, 7
work page 2020
-
[18]
Self-distilled self-supervised representation learning
Jiho Jang, Seonhoon Kim, Kiyoon Yoo, Chaerin Kong, Jangho Kim, and Nojun Kwak. Self-distilled self-supervised representation learning. In2023 IEEE/CVF Winter Con- ference on Applications of Computer Vision (WACV), pages 2828–2838, 2023. 2, 3, 7
work page 2023
-
[19]
arXiv preprint arXiv:2505.02831 (2025)
Dengyang Jiang, Mengmeng Wang, Liuzhuozheng Li, Lei Zhang, Haoyu Wang, Wei Wei, Guang Dai, Yanning Zhang, and Jingdong Wang. No other representation component is needed: Diffusion transformers can provide representation guidance by themselves.arXiv preprint arXiv:2505.02831,
-
[20]
Shaping inductive bias in diffusion models through frequency-based noise control
Thomas Jiralerspong, Berton Earnshaw, Jason Hartford, Yoshua Bengio, and Luca Scimeca. Shaping inductive bias in diffusion models through frequency-based noise control. InICLR 2025 Workshop on Deep Generative Model in Ma- chine Learning: Theory, Principle and Efficacy, 2025. 3
work page 2025
-
[21]
A Style-Based Generator Architecture for Generative Adversarial Networks
Tero Karras. A style-based generator architecture for genera- tive adversarial networks.arXiv preprint arXiv:1812.04948,
work page internal anchor Pith review Pith/arXiv arXiv
-
[22]
Musiq: Multi-scale image quality transformer
Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer
-
[23]
Does diffusion beat gan in image super resolution?arXiv preprint arXiv:2405.17261, 2024
Denis Kuznedelev, Valerii Startsev, Daniil Shlenskii, and Sergey Kastryulin. Does diffusion beat gan in image super resolution?arXiv preprint arXiv:2405.17261, 2024. 2
-
[24]
FedSR: Frequency-aware enhancement for diffusion-based image super-resolution,
Yueying Li, Hanbin Zhao, Jiaqing Zhou, Guozhi Xu, Tianlei Hu, Gang Chen, and Haobo Wang. FedSR: Frequency-aware enhancement for diffusion-based image super-resolution,
-
[25]
Swinir: Image restoration us- ing swin transformer
Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration us- ing swin transformer. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 1833–1844,
-
[26]
Fouriscale: A frequency perspective on training-free high-resolution image synthesis
Leon Lin, Rodger Zhang, Jeya Maria Jose Valanarasu, Haox- iang Wang, Evangelos Gatti, Prajwal andpKalogerakis, and Vishal M Patel. Fouriscale: A frequency perspective on training-free high-resolution image synthesis. InEuropean Conference on Computer Vision (ECCV), 2024. 14
work page 2024
-
[27]
Diff- bir: Toward blind image restoration with generative diffusion prior
Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Bo Dai, Fanghua Yu, Yu Qiao, Wanli Ouyang, and Chao Dong. Diff- bir: Toward blind image restoration with generative diffusion prior. InComputer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LIX, page 430–448, Berlin, Heidelberg,
work page 2024
-
[28]
Springer-Verlag. 2 9
-
[29]
Visual instruction tuning, 2023
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning, 2023. 3, 4, 11
work page 2023
-
[30]
Diffusion model is effectively its own teacher
Xinyin Ma, Runpeng Yu, Songhua Liu, Gongfan Fang, and Xinchao Wang. Diffusion model is effectively its own teacher. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12901–12911, 2025. 3
work page 2025
-
[31]
Missing fine details in images: Last seen in high frequencies.arXiv e-prints, pages arXiv–2509, 2025
Tejaswini Medi, Hsien-Yi Wang, Arianna Rampini, and Mar- gret Keuper. Missing fine details in images: Last seen in high frequencies.arXiv e-prints, pages arXiv–2509, 2025. 2
work page 2025
-
[32]
Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Mak- ing a “completely blind” image quality analyzer.IEEE Sig- nal processing letters, 20(3):209–212, 2012. 6
work page 2012
-
[33]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 4195–4205,
-
[34]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 2, 3, 6
work page 2022
-
[35]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 14
work page 2022
-
[36]
FitNets: Hints for Thin Deep Nets
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fit- nets: Hints for thin deep nets. arxiv 2014.arXiv preprint arXiv:1412.6550, 2014. 3
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[37]
U- net: Convolutional networks for biomedical image segmen- tation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmen- tation. InInternational Conference on Medical image com- puting and computer-assisted intervention, pages 234–241. Springer, 2015. 2, 4, 6
work page 2015
-
[38]
Chitwan Saharia, Jonathan Ho, William Chan, Tim Sali- mans, David J Fleet, and Mohammad Norouzi. Image super- resolution via iterative refinement.IEEE transactions on pattern analysis and machine intelligence, 45(4):4713–4726,
-
[39]
Shoaib Meraj Sami, Md Mahedi Hasan, Jeremy Dawson, and Nasser Nasrabadi. Hf-diff: High-frequency perceptual loss and distribution matching for one-step diffusion-based image super-resolution.arXiv preprint arXiv:2411.13548, 2024. 3
-
[40]
A primary comparison of diffusion models and generative adversarial networks for image synthesis
Zhuoyi Shen, Maoyu Mao, and Pengfei Fan. A primary comparison of diffusion models and generative adversarial networks for image synthesis. InProceedings of the 2024 7th International Conference on Machine Learning and Ma- chine Intelligence (MLMI), page 225–234, New York, NY , USA, 2024. Association for Computing Machinery. 2
work page 2024
-
[41]
Pixel-level and semantic-level ad- justable super-resolution: A dual-lora approach
Lingchen Sun, Rongyuan Wu, Zhiyuan Ma, Shuaizheng Liu, Qiaosi Yi, and Lei Zhang. Pixel-level and semantic-level ad- justable super-resolution: A dual-lora approach. InProceed- ings of the Computer Vision and Pattern Recognition Con- ference, pages 2333–2343, 2025. 1, 3, 7, 14
work page 2025
-
[42]
Con- trastive representation distillation
Yonglong Tian, Dilip Krishnan, and Phillip Isola. Con- trastive representation distillation. InInternational Confer- ence on Learning Representations (ICLR), 2020. 2, 3, 7
work page 2020
-
[43]
Ntire 2017 challenge on single image super-resolution: Methods and results
Radu Timofte, Eirikur Agustsson, Luc Van Gool, Ming- Hsuan Yang, and Lei Zhang. Ntire 2017 challenge on single image super-resolution: Methods and results. 2017. 6
work page 2017
-
[44]
Yuhao Wan, Peng-Tao Jiang, Qibin Hou, Hao Zhang, Jin- wei Chen, Ming-Ming Cheng, and Bo Li. Controlsr: Taming diffusion models for consistent real-world image super reso- lution.arXiv preprint arXiv:2410.14279, 2024. 2
-
[45]
Real-esrgan: Training real-world blind super-resolution with pure synthetic data
Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 1905–1914,
work page 1905
-
[46]
Xingjian Wang, Li Chai, and Jiming Chen. Frequency- domain refinement with multiscale diffusion for super res- olution.arXiv preprint arXiv:2405.10014, 2024. 3
-
[47]
Image quality assessment: from error visibility to structural similarity
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity. 2004. 6
work page 2004
-
[48]
Component divide- and-conquer for real-world image super-resolution
Pengxu Wei, Ziwei Xie, Hannan Lu, Zongyuan Zhan, Qixi- ang Ye, Wangmeng Zuo, and Liang Lin. Component divide- and-conquer for real-world image super-resolution. 2020. 6
work page 2020
-
[49]
Self-distillation for diffu- sion models, 2024
Damion Woods and Peter Bloem. Self-distillation for diffu- sion models, 2024. 3
work page 2024
-
[50]
Seesr: Towards semantics- aware real-world image super-resolution
Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, and Lei Zhang. Seesr: Towards semantics- aware real-world image super-resolution. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 25456–25467, 2024. 2, 6, 7, 14
work page 2024
-
[51]
Maniqa: Multi-dimension attention network for no-reference image quality assessment
Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. Maniqa: Multi-dimension attention network for no-reference image quality assessment. 2022. 6
work page 2022
-
[52]
Zongsheng Yue, Jianyi Wang, and Chen Change Loy. Resshift: Efficient diffusion model for image super- resolution by residual shifting.Advances in Neural Infor- mation Processing Systems, 36:13294–13307, 2023. 7
work page 2023
-
[53]
Be your own teacher: Improve the performance of convolutional neural networks via self distillation
Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chen- glong Bao, and Kaisheng Ma. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. InProceedings of the IEEE/CVF international conference on computer vision, pages 3713–3722, 2019. 2, 7
work page 2019
-
[54]
The unreasonable effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. 2018. 2, 6
work page 2018
-
[55]
Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, and Jiajun Liang. Decoupled knowledge distillation. InProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 11953–11962, 2022. 2, 3, 7 10 FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution Supp...
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.