Recognition: 2 theorem links
· Lean TheoremOP4KSR: One-Step Patch-Free 4K Super-Resolution with Periodic Artifact Suppression
Pith reviewed 2026-05-14 20:32 UTC · model grok-4.3
The pith
OP4KSR enables direct 4K super-resolution of full images in one diffusion step by using F16 VAE compression and fixing periodic artifacts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OP4KSR adapts the Flux backbone for one-step super-resolution to 4K by employing F16 VAE for extreme compression to fit within GPU limits. It addresses the resulting periodic artifacts through RoPE base frequency rescaling and an autocorrelation-based periodicity loss, while also introducing a new training dataset and benchmarks. This yields competitive perceptual quality with full global context preserved and fast inference.
What carries the argument
F16 VAE compression paired with RoPE base frequency rescaling and autocorrelation-based periodicity loss for artifact-free one-step 4K super-resolution.
If this is right
- Enables generation of 4096x4096 images while preserving global spatial and semantic coherence.
- Reduces inference time to 5.75 seconds per 4K output on a single H20 GPU.
- Provides dedicated 4K SR datasets and benchmarks for future research.
- Achieves perceptual quality comparable to prior methods without patch-related inconsistencies.
Where Pith is reading between the lines
- The technique may apply to other high-resolution generative tasks where memory limits force one-step processing.
- Similar artifact suppression could improve consistency in other diffusion-based image restoration methods.
- Future work might test if this scales to even higher resolutions like 8K with adjusted compression.
Load-bearing premise
The F16 VAE must preserve enough detail for the one-step model to reach competitive perceptual quality without introducing new degradations from the compression.
What would settle it
Running OP4KSR on the real-world 4K benchmarks and finding that its outputs score lower on perceptual metrics or exhibit remaining periodic patterns compared to patch-based baselines.
Figures
read the original abstract
Diffusion-based real-world image super-resolution (Real-ISR) has achieved remarkable perceptual quality; however, directly super-resolving images to 4K remains limited by extreme memory consumption. Consequently, prior methods adopt patch-based inference, sacrificing global context and introducing semantic confusion, spatial inconsistency, and severe latency. We propose OP4KSR, a one-step patch-free 4K SR approach built upon the powerful Flux backbone. By leveraging the extreme-compression F16 VAE, OP4KSR makes 4K SR inference tractable under practical GPU budgets, preserving global spatial-semantic coherence while enabling highly efficient inference. However, adapting this one-step architecture intrinsically triggers severe periodic artifacts. We trace this to a RoPE base frequency allocation mismatch and intra-token spatial ambiguity, both exacerbated by the lack of iterative refinement. To suppress these artifacts, we couple RoPE base frequency rescaling (RFR) with an autocorrelation-based periodicity loss ($\mathcal{L}_\text{AP}$). Furthermore, we curate a dedicated training dataset alongside three benchmarks (one synthetic and two real-world) to advance 4K SR research. Extensive experiments demonstrate that OP4KSR achieves competitive perceptual quality with efficient inference, generating a $4096\times4096$ output in only 5.75 seconds on a single NVIDIA H20 GPU.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces OP4KSR, a one-step patch-free 4K super-resolution method built on the Flux diffusion backbone. It employs an F16 VAE for extreme latent compression to enable full-image inference under practical GPU memory limits, avoiding the semantic and spatial inconsistencies of patch-based approaches. To address periodic artifacts induced by the one-step regime, the authors propose RoPE base frequency rescaling (RFR) together with an autocorrelation-based periodicity loss (L_AP). A dedicated training dataset and three 4K benchmarks (one synthetic, two real-world) are curated. Experiments are claimed to show competitive perceptual quality at 5.75 s inference for 4096×4096 outputs on a single NVIDIA H20 GPU.
Significance. If the quantitative results hold, the work would be significant for practical 4K Real-ISR by demonstrating that extreme VAE compression plus targeted artifact suppression can deliver global coherence without patch stitching or multi-step denoising. The reported runtime is a clear practical advantage over prior patch-based diffusion SR methods. However, the significance is tempered by the absence of explicit high-frequency preservation metrics or direct comparisons against strong patch-based baselines in the abstract, leaving the efficiency-quality tradeoff only weakly substantiated from the provided text.
major comments (2)
- [Abstract and §3 (method)] The headline claim of competitive perceptual quality rests on the assumption that F16 VAE compression preserves sufficient high-frequency content for the one-step Flux model to match or exceed patch-based Real-ISR methods. This is load-bearing for the central contribution yet is not directly tested; extreme latent downsampling inherently discards fine spatial frequencies, and neither RFR nor L_AP can restore lost detail. A quantitative ablation (e.g., frequency-domain energy comparison or LPIPS/ perceptual metrics before/after F16 encoding) is required in the experiments section to support the claim.
- [Abstract and §4 (experiments)] The abstract states 'extensive experiments demonstrate competitive perceptual quality' but supplies no numerical values, error bars, or table references. Without these, the reader cannot assess whether the 5.75 s runtime trades off sharpness for coherence. The experiments section must include direct side-by-side metrics (PSNR, LPIPS, NIQE, user study) against at least two recent patch-based 4K baselines.
minor comments (2)
- [Abstract and §3] Notation for the autocorrelation loss is introduced as L_AP in the abstract but should be consistently defined with its mathematical form in the first occurrence of §3.
- [§4] The three new benchmarks are mentioned but their construction details (synthetic degradation model, real-world capture protocol, resolution statistics) are not summarized; a short table or paragraph would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help strengthen the validation of our claims. We address each major point below and have made revisions to incorporate the suggested ablations and quantitative comparisons in the revised manuscript.
read point-by-point responses
-
Referee: [Abstract and §3 (method)] The headline claim of competitive perceptual quality rests on the assumption that F16 VAE compression preserves sufficient high-frequency content for the one-step Flux model to match or exceed patch-based Real-ISR methods. This is load-bearing for the central contribution yet is not directly tested; extreme latent downsampling inherently discards fine spatial frequencies, and neither RFR nor L_AP can restore lost detail. A quantitative ablation (e.g., frequency-domain energy comparison or LPIPS/ perceptual metrics before/after F16 encoding) is required in the experiments section to support the claim.
Authors: We agree that a direct test of high-frequency preservation under F16 compression is valuable for substantiating the central claim. In the revised manuscript, we have added a new ablation study in Section 4 that includes frequency-domain energy spectrum comparisons (via FFT magnitude analysis) and LPIPS/perceptual metric evaluations on images before and after F16 VAE encoding/decoding. These results show that while some high-frequency energy is attenuated, the global context modeling in the one-step Flux backbone combined with RFR and L_AP enables recovery of perceptually relevant details sufficient to match patch-based baselines. We have also clarified in §3 how the periodicity suppression mechanisms mitigate the impact of any residual compression artifacts. revision: yes
-
Referee: [Abstract and §4 (experiments)] The abstract states 'extensive experiments demonstrate competitive perceptual quality' but supplies no numerical values, error bars, or table references. Without these, the reader cannot assess whether the 5.75 s runtime trades off sharpness for coherence. The experiments section must include direct side-by-side metrics (PSNR, LPIPS, NIQE, user study) against at least two recent patch-based 4K baselines.
Authors: We concur that explicit numerical results and direct comparisons are necessary for readers to evaluate the efficiency-quality tradeoff. In the revised version, we have updated the abstract to reference key metrics (e.g., LPIPS and NIQE values with comparisons) and expanded Section 4 with a new table providing side-by-side results against two recent patch-based 4K Real-ISR baselines. The table includes PSNR, LPIPS, NIQE, and user study scores (with standard deviations), demonstrating that OP4KSR achieves competitive perceptual quality at substantially lower latency. Error bars are reported for all metrics where applicable. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper presents an empirical engineering solution for one-step 4K SR using the Flux backbone, F16 VAE compression, RoPE rescaling (RFR), and an autocorrelation loss (L_AP) to suppress observed periodic artifacts. These components are introduced as targeted fixes for memory limits and artifact patterns rather than predictions or results derived from the method itself. No equations reduce by construction to inputs, no fitted parameters are relabeled as predictions, and no load-bearing self-citations or uniqueness theorems are invoked. The derivation chain remains self-contained with external experimental validation on curated datasets.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Breath1024.leanperiod8 / 8-tick oscillator neutrality echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
autocorrelation-based periodicity loss (L_AP) ... lags: K = {8, 16, 24, 32, 40} ... 32-pixel grid artifacts ... fundamental period of 32 pixels
-
IndisputableMonolith/Cost/FunctionalEquation.leanJ(x) = ½(x + x⁻¹) − 1 uniqueness + ratio symmetry echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
RoPE base frequency rescaling (RFR) ... θ from 10000 to 100 ... extends effective bandwidth of strong positional signals
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops
Agustsson, E., Timofte, R.: Ntire 2017 challenge on single image super-resolution: Dataset and study. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. pp. 126–135 (2017)
2017
-
[2]
Advances in Neural Information Processing Systems37, 55443–55469 (2024)
Ai, Y., Zhou, X., Huang, H., Han, X., Chen, Z., You, Q., Yang, H.: Dreamclear: High- capacity real-world image restoration with privacy-safe dataset curation. Advances in Neural Information Processing Systems37, 55443–55469 (2024)
2024
-
[3]
Bai, S., Cai, Y., Chen, R., Chen, K., Chen, X., Cheng, Z., Deng, L., Ding, W., Gao, C., Ge, C., et al.: Qwen3-vl technical report. arXiv preprint arXiv:2511.21631 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[4]
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Blattmann, A., Dockhorn, T., Kulal, S., Mendelevitch, D., Kilian, M., Lorenz, D., Levi, Y., English, Z., Voleti, V., Letts, A., et al.: Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[5]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Chen, B., Li, G., Wu, R., Zhang, X., Chen, J., Zhang, J., Zhang, L.: Adversarial diffusion compression for real-world image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 28208–28220 (2025)
2025
-
[6]
IEEE Transactions on Image Processing33, 2404–2418 (2024)
Chen, C., Mo, J., Hou, J., Wu, H., Liao, L., Sun, W., Yan, Q., Lin, W.: Topiq: A top-down approach from semantics to distortions for image quality assessment. IEEE Transactions on Image Processing33, 2404–2418 (2024)
2024
-
[7]
arXiv preprint arXiv:2512.14061 (2025)
Chen, H., Chen, J., Pan, J., Dong, J.: Bridging fidelity-reality with controllable one-step diffusion for image super-resolution. arXiv preprint arXiv:2512.14061 (2025)
-
[8]
In: European Conference on Computer Vision
Chen, J., Ge, C., Xie, E., Wu, Y., Yao, L., Ren, X., Wang, Z., Luo, P., Lu, H., Li, Z.: Pixart-σ: Weak-to-strong training of diffusion transformer for 4k text-to-image generation. In: European Conference on Computer Vision. pp. 74–91. Springer (2024)
2024
-
[9]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Chen, J., Pan, J., Dong, J.: Faithdiff: Unleashing diffusion priors for faithful image super-resolution. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 28188–28197 (2025)
2025
-
[10]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Dai, T., Cai, J., Zhang, Y., Xia, S.T., Zhang, L.: Second-order attention network for single image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11065–11074 (2019)
2019
-
[11]
arXiv preprint arXiv:2602.24240 (2026)
Deng, C., Chen, Z., Yu, L., Zhang, K., Zhou, X., Zhang, W.: Joint geometric and trajectory consistency learning for one-step real-world super-resolution. arXiv preprint arXiv:2602.24240 (2026)
-
[12]
Pattern Recognition175, 113057 (2026)
Deng, C., Zhang, K., Yang, L., Zhang, W., Yu, L.: Ihmambasr: An importance- guided hierarchical mamba with dynamic prompt for single image super-resolution. Pattern Recognition175, 113057 (2026)
2026
-
[13]
IEEE transactions on pattern analysis and machine intelligence44(5), 2567–2581 (2020)
Ding, K., Ma, K., Wang, S., Simoncelli, E.P.: Image quality assessment: Unifying structure and texture similarity. IEEE transactions on pattern analysis and machine intelligence44(5), 2567–2581 (2020)
2020
-
[14]
IEEE transactions on pattern analysis and machine intelligence 38(2), 295–307 (2015)
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolu- tional networks. IEEE transactions on pattern analysis and machine intelligence 38(2), 295–307 (2015)
2015
-
[15]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Dong, L., Fan, Q., Guo, Y., Wang, Z., Zhang, Q., Chen, J., Luo, Y., Zou, C.: Tsd-sr: One-step diffusion with target score distillation for real-world image super-resolution. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 23174–23184 (2025) OP4KSR 17
2025
-
[16]
TinySR: Pruning Diffusion for Real-World Image Super-Resolution
Dong, L., Fan, Q., Yu, Y., Zhang, Q., Chen, J., Luo, Y., Zou, C.: Tinysr: Pruning diffusion for real-world image super-resolution. arXiv preprint arXiv:2508.17434 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[17]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Duan, Z.P., Zhang, J., Jin, X., Zhang, Z., Xiong, Z., Zou, D., Ren, J.S., Guo, C., Li, C.: Dit4sr: Taming diffusion transformer for real-world image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 18948–18958 (2025)
2025
-
[18]
LucidFlux: Caption-Free Photo-Realistic Image Restoration via a Large-Scale Diffusion Transformer
Fei, S., Ye, T., Wang, L., Zhu, L.: Lucidflux: Caption-free universal image restoration via a large-scale diffusion transformer. arXiv preprint arXiv:2509.22414 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[19]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Gankhuyag, G., Yoon, K., Park, J., Son, H.S., Min, K.: Lightweight real-time image super-resolution network for 4k images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1746–1755 (2023)
2023
-
[20]
In: European conference on computer vision
Guo, H., Li, J., Dai, T., Ouyang, Z., Ren, X., Xia, S.T.: Mambair: A simple baseline for image restoration with state-space model. In: European conference on computer vision. pp. 222–241. Springer (2024)
2024
-
[21]
Springer (2005)
Jähne, B.: Digital image processing. Springer (2005)
2005
-
[22]
In: Proceedings of the IEEE/CVF international conference on computer vision
Ke, J., Wang, Q., Wang, Y., Milanfar, P., Yang, F.: Musiq: Multi-scale image quality transformer. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 5148–5157 (2021)
2021
-
[23]
Labs, B.F.: Flux.https://github.com/black-forest-labs/flux(2023)
2023
-
[24]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4681–4690 (2017)
2017
-
[25]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Li, B., Zhao, H., Wang, W., Hu, P., Gou, Y., Peng, X.: Mair: A locality-and continuity-preserving mamba for image restoration. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 7491–7501 (2025)
2025
-
[26]
arXiv preprint arXiv:2502.01993 (2025)
Li, J., Cao, J., Guo, Y., Li, W., Zhang, Y.: One diffusion step to real-world super- resolution via flow trajectory distillation. arXiv preprint arXiv:2502.01993 (2025)
-
[27]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Li, Y., Zhang, K., Liang, J., Cao, J., Liu, C., Gong, R., Zhang, Y., Tang, H., Liu, Y., Demandolx, D., et al.: Lsdir: A large scale dataset for image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1775–1787 (2023)
2023
-
[28]
In: Proceedings of the IEEE/CVF international conference on computer vision
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: Image restoration using swin transformer. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 1833–1844 (2021)
2021
-
[30]
In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops
Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. pp. 136–144 (2017)
2017
-
[31]
In: European conference on computer vision
Lin, X., He, J., Chen, Z., Lyu, Z., Dai, B., Yu, F., Qiao, Y., Ouyang, W., Dong, C.: Diffbir: Toward blind image restoration with generative diffusion prior. In: European conference on computer vision. pp. 430–448. Springer (2024)
2024
-
[32]
ACM Transactions on Graphics (TOG)44(6), 1–21 (2025)
Lin, X., Yu, F., Hu, J., You, Z., Shi, W., Ren, J.S., Gu, J., Dong, C.: Harnessing diffusion-yielded score priors for image restoration. ACM Transactions on Graphics (TOG)44(6), 1–21 (2025)
2025
-
[33]
Flow Matching for Generative Modeling
Lipman, Y., Chen, R.T., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. arXiv preprint arXiv:2210.02747 (2022) 18 C. Deng et al
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[34]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Long, W., Zhou, X., Zhang, L., Gu, S.: Progressive focused transformer for sin- gle image super-resolution. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 2279–2288 (2025)
2025
-
[35]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Mei, Y., Fan, Y., Zhou, Y.: Image super-resolution with non-local sparse attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3517–3526 (2021)
2021
-
[36]
In: European conference on computer vision
Mou, C., Wu, Y., Wang, X., Dong, C., Zhang, J., Shan, Y.: Metric learning based interactive modulation for real-world super-resolution. In: European conference on computer vision. pp. 723–740. Springer (2022)
2022
-
[37]
arXiv preprint arXiv:2501.16583 (2025)
Peng, L., Di, X., Feng, Z., Li, W., Pei, R., Wang, Y., Fu, X., Cao, Y., Zha, Z.J.: Directing mamba to complex textures: An efficient texture-aware state space model for image restoration. arXiv preprint arXiv:2501.16583 (2025)
-
[38]
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., Rombach, R.: Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[39]
Advances in neural information processing systems35, 25278–25294 (2022)
Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C., Wightman, R., Cherti, M., Coombes, T., Katta, A., Mullis, C., Wortsman, M., et al.: Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in neural information processing systems35, 25278–25294 (2022)
2022
-
[40]
arXiv preprint arXiv:2510.03012 (2025)
Sun, H., Jiang, L., Li, F., Pei, R., Wang, Z., Guo, Y., Xu, J., Chen, H., Han, J., Song, F., et al.: Pocketsr: The super-resolution expert in your pocket mobiles. arXiv preprint arXiv:2510.03012 (2025)
-
[41]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Sun, L., Wu, R., Ma, Z., Liu, S., Yi, Q., Zhang, L.: Pixel-level and semantic-level adjustable super-resolution: A dual-lora approach. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 2333–2343 (2025)
2025
-
[42]
Pattern Recognition p
Tai, Y., Xie, R., Zhao, C., Zhang, K., Zhang, Z., Zhou, J., Yang, J.: Addsr: Acceler- ating diffusion-based blind super-resolution with adversarial diffusion distillation. Pattern Recognition p. 113012 (2026)
2026
-
[43]
IEEE transactions on image processing27(8), 3998–4011 (2018)
Talebi, H., Milanfar, P.: Nima: Neural image assessment. IEEE transactions on image processing27(8), 3998–4011 (2018)
2018
-
[44]
In: Proceedings of the AAAI Conference on Artificial Intelligence
Wang, J., Chan, K.C., Loy, C.C.: Exploring clip for assessing the look and feel of images. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 37, pp. 2555–2563 (2023)
2023
-
[45]
International Journal of Computer Vision 132(12), 5929–5949 (2024)
Wang, J., Yue, Z., Zhou, S., Chan, K.C., Loy, C.C.: Exploiting diffusion prior for real-world image super-resolution. International Journal of Computer Vision 132(12), 5929–5949 (2024)
2024
-
[46]
In: Proceedings of the IEEE/CVF international conference on computer vision
Wang, X., Xie, L., Dong, C., Shan, Y.: Real-esrgan: Training real-world blind super- resolution with pure synthetic data. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 1905–1914 (2021)
1905
-
[47]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Wang, Y., Yang, W., Chen, X., Wang, Y., Guo, L., Chau, L.P., Liu, Z., Qiao, Y., Kot, A.C., Wen, B.: Sinsr: diffusion-based image super-resolution in a single step. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 25796–25805 (2024)
2024
-
[48]
Advances in neural information processing systems36, 8406–8441 (2023)
Wang, Z., Lu, C., Wang, Y., Bao, F., Li, C., Su, H., Zhu, J.: Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. Advances in neural information processing systems36, 8406–8441 (2023)
2023
-
[49]
IEEE transactions on image processing 13(4), 600–612 (2004) OP4KSR 19
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13(4), 600–612 (2004) OP4KSR 19
2004
-
[50]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Wei, Y., Gu, S., Li, Y., Timofte, R., Jin, L., Song, H.: Unsupervised real-world image super resolution via domain-distance aware training. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 13385–13394 (2021)
2021
-
[51]
Advances in Neural Information Processing Systems 37, 92529–92553 (2024)
Wu, R., Sun, L., Ma, Z., Zhang, L.: One-step effective diffusion network for real- world image super-resolution. Advances in Neural Information Processing Systems 37, 92529–92553 (2024)
2024
-
[52]
In: Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition
Wu, R., Yang, T., Sun, L., Zhang, Z., Li, S., Zhang, L.: Seesr: Towards semantics- aware real-world image super-resolution. In: Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition. pp. 25456–25467 (2024)
2024
-
[53]
arXiv preprint arXiv:2508.08227 (2025)
Wu, Z., Sun, Z., Zhou, T., Fu, B., Cong, J., Dong, Y., Zhang, H., Tang, X., Chen, M., Wei, X.: Omgsr: You only need one mid-timestep guidance for real-world image super-resolution. arXiv preprint arXiv:2508.08227 (2025)
-
[54]
arXiv preprint arXiv:2509.10122 (2025)
Wu, Z., Zheng, S., Jiang, P.T., Yuan, X.: Realism control one-step diffusion for real-world image super-resolution. arXiv preprint arXiv:2509.10122 (2025)
-
[55]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition
Yang, S., Wu, T., Shi, S., Lao, S., Gong, Y., Cao, M., Wang, J., Yang, Y.: Maniqa: Multi-dimension attention network for no-reference image quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition. pp. 1191–1200 (2022)
2022
-
[56]
In: European conference on computer vision
Yang,T.,Wu,R.,Ren,P.,Xie,X.,Zhang,L.:Pixel-awarestablediffusionforrealistic image super-resolution and personalized stylization. In: European conference on computer vision. pp. 74–91. Springer (2024)
2024
-
[57]
arXiv preprint arXiv:2511.18050 (2025)
Ye, T., Fei, S., Zhu, L.: Ultraflux: Data-model co-design for high-quality na- tive 4k text-to-image generation across diverse aspect ratios. arXiv preprint arXiv:2511.18050 (2025)
-
[58]
In: Proceedings of the IEEE/CVF international conference on computer vision
Yi, Q., Li, S., Wu, R., Sun, L., Wu, Y., Zhang, L.: Fine-structure preserved real-world image super-resolution via transfer vae training. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 12415–12426 (2025)
2025
-
[59]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Yoon, K., Gankhuyag, G., Park, J., Son, H., Min, K.: Casr: Efficient cascade network structure with channel aligned method for 4k real-time single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 7911–7920 (2024)
2024
-
[60]
arXiv preprint arXiv:2503.20349 (2025)
You, W., Zhang, M., Zhang, L., Zhou, X., Shi, K., Gu, S.: Consistency trajectory matching for one-step generative super-resolution. arXiv preprint arXiv:2503.20349 (2025)
-
[61]
ACM Trans
Yu, D., Min, W., Jin, X., Jiang, Q., Jin, Y., Jiang, S.: Diverse and high-quality food image generation from only food names. ACM Trans. Multimedia Comput. Commun. Appl.21(5) (May 2025)
2025
-
[62]
IEEE Transactions on Image Processing34, 7290–7304 (2025)
Yu, D., Min, W., Jin, X., Jiang, Q., Yao, S., Jiang, S.: Food3d: Text-driven customizable 3d food generation with gaussian splatting. IEEE Transactions on Image Processing34, 7290–7304 (2025)
2025
-
[63]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Yu, F., Gu, J., Li, Z., Hu, J., Kong, X., Wang, X., He, J., Qiao, Y., Dong, C.: Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 25669–25680 (2024)
2024
-
[64]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Yue, Z., Liao, K., Loy, C.C.: Arbitrary-steps image super-resolution via diffu- sion inversion. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 23153–23163 (2025)
2025
-
[65]
Advances in Neural Information Processing Systems 36, 13294–13307 (2023) 20 C
Yue, Z., Wang, J., Loy, C.C.: Resshift: Efficient diffusion model for image super- resolution by residual shifting. Advances in Neural Information Processing Systems 36, 13294–13307 (2023) 20 C. Deng et al
2023
-
[66]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Zamfir, E., Conde, M.V., Timofte, R.: Towards real-time 4k image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1522–1532 (2023)
2023
-
[67]
Zhang, A., Yue, Z., Pei, R., Ren, W., Cao, X.: Degradation-guided one-step image super-resolution with diffusion priors. arXiv preprint arXiv:2409.17058 (2024)
-
[68]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Zhang,J.,Huang,Q.,Liu,J.,Guo,X.,Huang,D.:Diffusion-4k:Ultra-high-resolution image synthesis with latent diffusion models. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 23464–23473 (2025)
2025
-
[69]
In: Proceedings of the IEEE/CVF international conference on computer vision
Zhang, K., Liang, J., Van Gool, L., Timofte, R.: Designing a practical degradation model for deep blind image super-resolution. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4791–4800 (2021)
2021
-
[70]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Zhang, L., Li, Y., Zhou, X., Zhao, X., Gu, S.: Transcending the limit of local window: Advanced super-resolution transformer with adaptive token dictionary. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2856–2865 (2024)
2024
-
[71]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Zhang, L., You, W., Shi, K., Gu, S.: Uncertainty-guided perturbation for image super-resolution diffusion model. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 17980–17989 (2025)
2025
-
[72]
IEEE Transactions on Image Processing24(8), 2579–2591 (2015)
Zhang, L., Zhang, L., Bovik, A.C.: A feature-enriched completely blind image quality evaluator. IEEE Transactions on Image Processing24(8), 2579–2591 (2015)
2015
-
[73]
In: Proceedings of the IEEE/CVF international conference on computer vision
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 3836–3847 (2023)
2023
-
[74]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)
2018
-
[75]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Zhou, Y., Li, Z., Guo, C.L., Bai, S., Cheng, M.M., Hou, Q.: Srformer: Permuted self-attention for single image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 12780–12791 (2023)
2023
-
[76]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Zhu,L.,Li,J.,Qin,H.,Li,W.,Zhang,Y.,Guo,Y.,Yang,X.:Passionsr:Post-training quantization with adaptive scale in one-step diffusion based image super-resolution. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 12778–12788 (2025)
2025
-
[77]
arXiv preprint arXiv:2412.09465 (2024)
Zhu, Y., Wang, R., Lu, S., Li, J., Yan, H., Zhang, K.: Oftsr: One-step flow for image super-resolution with tunable fidelity-realism trade-offs. arXiv preprint arXiv:2412.09465 (2024)
-
[78]
arXiv preprint arXiv:2507.07105 (2025)
Zuo, Y., Zheng, Q., Wu, M., Jiang, X., Li, R., Wang, J., Zhang, Y., Mai, G., Wang, L.V., Zou, J., et al.: 4kagent: agentic any image to 4k super-resolution. arXiv preprint arXiv:2507.07105 (2025)
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.