pith. machine review for the scientific record. sign in

arxiv: 2605.13457 · v1 · submitted 2026-05-13 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

OP4KSR: One-Step Patch-Free 4K Super-Resolution with Periodic Artifact Suppression

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:32 UTC · model grok-4.3

classification 💻 cs.CV
keywords 4K super-resolutiondiffusion modelspatch-free inferenceperiodic artifact suppressionFlux backboneVAE compressionreal-world image enhancement
0
0 comments X

The pith

OP4KSR enables direct 4K super-resolution of full images in one diffusion step by using F16 VAE compression and fixing periodic artifacts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that one-step diffusion models can achieve high-quality 4K image super-resolution on full images without dividing them into patches. This matters because patch-based methods often create visible seams, semantic mismatches, and slow processing times for large outputs. By building on the Flux model with heavy F16 VAE compression, the approach makes memory usage manageable for 4K inference. The authors identify periodic artifacts specific to this setup and correct them using rescaled RoPE frequencies plus a new autocorrelation loss. Success here would allow efficient, coherent upscaling of entire high-resolution scenes in practical computing environments.

Core claim

OP4KSR adapts the Flux backbone for one-step super-resolution to 4K by employing F16 VAE for extreme compression to fit within GPU limits. It addresses the resulting periodic artifacts through RoPE base frequency rescaling and an autocorrelation-based periodicity loss, while also introducing a new training dataset and benchmarks. This yields competitive perceptual quality with full global context preserved and fast inference.

What carries the argument

F16 VAE compression paired with RoPE base frequency rescaling and autocorrelation-based periodicity loss for artifact-free one-step 4K super-resolution.

If this is right

  • Enables generation of 4096x4096 images while preserving global spatial and semantic coherence.
  • Reduces inference time to 5.75 seconds per 4K output on a single H20 GPU.
  • Provides dedicated 4K SR datasets and benchmarks for future research.
  • Achieves perceptual quality comparable to prior methods without patch-related inconsistencies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The technique may apply to other high-resolution generative tasks where memory limits force one-step processing.
  • Similar artifact suppression could improve consistency in other diffusion-based image restoration methods.
  • Future work might test if this scales to even higher resolutions like 8K with adjusted compression.

Load-bearing premise

The F16 VAE must preserve enough detail for the one-step model to reach competitive perceptual quality without introducing new degradations from the compression.

What would settle it

Running OP4KSR on the real-world 4K benchmarks and finding that its outputs score lower on perceptual metrics or exhibit remaining periodic patterns compared to patch-based baselines.

Figures

Figures reproduced from arXiv: 2605.13457 by Chengyan Deng, Kai Zhang, Li Yu, Lunxi Yuan, Meng Li, Pengbin Yu, Wei Shen, Xue Zhou, Zhentao Chen.

Figure 1
Figure 1. Figure 1: Illustration of key limitations in patch-based 4K super-resolution. (a) Semantic confusion caused by prompt-dominated inference on local patches. (b) Spatial inconsis￾tency arising from processing patches independently. such as CNN-based [10, 14, 29, 35], Transformer-based [28, 34, 70, 75], Mamba￾based [12,20,25,37], and GAN-based methods [24,46,50,69] have made significant progress in SISR by leveraging l… view at source ↗
Figure 2
Figure 2. Figure 2: Visual examples of periodic artifacts. These artifacts typically manifest in large, flat, and texturally homogeneous regions. We illustrate four representative scenes prone to such artifacts: (a) water surface, (b) ground, (c) lawn, and (d) face. To overcome these limitations, we propose OP4KSR, a one-step, patch￾free framework for 4K SR based on the Flux framework. First, we adopt an extreme-compression (… view at source ↗
Figure 3
Figure 3. Figure 3: , the framework empowers OP4KSR to achieve a nearly 10× speedup over one-step methods at 4K resolution while achieving competitive results. Ours (OP4KSR) OSEDiff OMGSR AddSR SUPIR DreamClear SinSR ResShift Better [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overall architecture of OP4KSR. To suppress periodic artifacts, we introduce two components: (a) RoPE Base Frequency Rescaling (RFR), which lowers the RoPE base frequency θ to resolve inter-token ambiguity and restore fine-grained structural sensitivity; (b) Autocorrelation-based Periodicity Loss (LAP), which partitions images into Q = 4 quadrants for independent autocorrelation computation to penalize per… view at source ↗
Figure 5
Figure 5. Figure 5: Impact of RoPE base frequency θ on spatial perception. (a) Phase collapse of adjacent tokens: Under the default θ = 10,000, the relative phase difference ∆ϕi rapidly collapses below 5 ◦ , leaving only 8/28 dimensions effective for local positioning. In contrast, our proposed θ = 100 extends the strong signal bandwidth to 15/28 dimensions. (b) Visualization of inter-token ambiguity: The default setting prod… view at source ↗
Figure 6
Figure 6. Figure 6: Illustration of intra-token blindness. Left: The Unpack operation rearranges channel-wise sub-pixel features (a1, b1, c1, d1) into a 2×2 spatial grid. Right: PCA visualization reveals periodic grid artifacts in XUnpacked after DiT, in contrast to the artifact-free latents before DiT. This confirms that the periodic artifacts originate from the DiT module. (ii) Unpacking Chan￾nels Lacking Spatial In￾formati… view at source ↗
Figure 7
Figure 7. Figure 7: Data curation pipeline for the 4KSR-Train dataset. Raw images are collected at scale and processed through a three-stage filtering pipeline: (1) Pre￾liminary Filtering, (2) Degradation & Aesthetic Assessment, and (3) Multi￾modal Quality Assessment. Existing SR datasets (e.g., DIV2K [1], LSDIR [27] and Flickr2K [30]) are lim￾ited in resolution, while publicly available high-fidelity 4K resources remain scar… view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative comparisons of different methods on two synthetic datasets. and statistical metrics (e.g., PSNR and NIQE), deep-feature-based metrics (e.g., CLIPIQA and LPIPS) are inherently resolution-sensitive. Thus, we adopt a patch-based evaluation strategy, as detailed in the Supplementary Material. Implementation Details. Following [51, 53], we freeze the VAE decoder and exclusively fine-tune the VAE enc… view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative comparisons of different methods on two real-world datasets. with the LoRA rank set to 64 for both. To reduce the computational overhead associated with text feature extraction, we adopt fixed text prompts within the S3Diff pipeline [67]. During the training phase, we utilize the AdamW optimizer with a learning rate of 2.5 × 10−5 . The experiments are conducted on 8 H20 GPUs with an effective b… view at source ↗
Figure 10
Figure 10. Figure 10: Ablation study of the proposed artifact suppression mechanism in frequency and spatial domains. Blue boxes show the 2D FFT spectra of the full super-resolved images, while yellow boxes highlight local spatial details corresponding to the red box in (a). We compare: (a) the LR input, (b) baseline, (c) baseline + RFR, and (d) our full method (RFR + LAP). 4.3 Ablation Ablation for Periodic Artifact Suppressi… view at source ↗
read the original abstract

Diffusion-based real-world image super-resolution (Real-ISR) has achieved remarkable perceptual quality; however, directly super-resolving images to 4K remains limited by extreme memory consumption. Consequently, prior methods adopt patch-based inference, sacrificing global context and introducing semantic confusion, spatial inconsistency, and severe latency. We propose OP4KSR, a one-step patch-free 4K SR approach built upon the powerful Flux backbone. By leveraging the extreme-compression F16 VAE, OP4KSR makes 4K SR inference tractable under practical GPU budgets, preserving global spatial-semantic coherence while enabling highly efficient inference. However, adapting this one-step architecture intrinsically triggers severe periodic artifacts. We trace this to a RoPE base frequency allocation mismatch and intra-token spatial ambiguity, both exacerbated by the lack of iterative refinement. To suppress these artifacts, we couple RoPE base frequency rescaling (RFR) with an autocorrelation-based periodicity loss ($\mathcal{L}_\text{AP}$). Furthermore, we curate a dedicated training dataset alongside three benchmarks (one synthetic and two real-world) to advance 4K SR research. Extensive experiments demonstrate that OP4KSR achieves competitive perceptual quality with efficient inference, generating a $4096\times4096$ output in only 5.75 seconds on a single NVIDIA H20 GPU.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces OP4KSR, a one-step patch-free 4K super-resolution method built on the Flux diffusion backbone. It employs an F16 VAE for extreme latent compression to enable full-image inference under practical GPU memory limits, avoiding the semantic and spatial inconsistencies of patch-based approaches. To address periodic artifacts induced by the one-step regime, the authors propose RoPE base frequency rescaling (RFR) together with an autocorrelation-based periodicity loss (L_AP). A dedicated training dataset and three 4K benchmarks (one synthetic, two real-world) are curated. Experiments are claimed to show competitive perceptual quality at 5.75 s inference for 4096×4096 outputs on a single NVIDIA H20 GPU.

Significance. If the quantitative results hold, the work would be significant for practical 4K Real-ISR by demonstrating that extreme VAE compression plus targeted artifact suppression can deliver global coherence without patch stitching or multi-step denoising. The reported runtime is a clear practical advantage over prior patch-based diffusion SR methods. However, the significance is tempered by the absence of explicit high-frequency preservation metrics or direct comparisons against strong patch-based baselines in the abstract, leaving the efficiency-quality tradeoff only weakly substantiated from the provided text.

major comments (2)
  1. [Abstract and §3 (method)] The headline claim of competitive perceptual quality rests on the assumption that F16 VAE compression preserves sufficient high-frequency content for the one-step Flux model to match or exceed patch-based Real-ISR methods. This is load-bearing for the central contribution yet is not directly tested; extreme latent downsampling inherently discards fine spatial frequencies, and neither RFR nor L_AP can restore lost detail. A quantitative ablation (e.g., frequency-domain energy comparison or LPIPS/ perceptual metrics before/after F16 encoding) is required in the experiments section to support the claim.
  2. [Abstract and §4 (experiments)] The abstract states 'extensive experiments demonstrate competitive perceptual quality' but supplies no numerical values, error bars, or table references. Without these, the reader cannot assess whether the 5.75 s runtime trades off sharpness for coherence. The experiments section must include direct side-by-side metrics (PSNR, LPIPS, NIQE, user study) against at least two recent patch-based 4K baselines.
minor comments (2)
  1. [Abstract and §3] Notation for the autocorrelation loss is introduced as L_AP in the abstract but should be consistently defined with its mathematical form in the first occurrence of §3.
  2. [§4] The three new benchmarks are mentioned but their construction details (synthetic degradation model, real-world capture protocol, resolution statistics) are not summarized; a short table or paragraph would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help strengthen the validation of our claims. We address each major point below and have made revisions to incorporate the suggested ablations and quantitative comparisons in the revised manuscript.

read point-by-point responses
  1. Referee: [Abstract and §3 (method)] The headline claim of competitive perceptual quality rests on the assumption that F16 VAE compression preserves sufficient high-frequency content for the one-step Flux model to match or exceed patch-based Real-ISR methods. This is load-bearing for the central contribution yet is not directly tested; extreme latent downsampling inherently discards fine spatial frequencies, and neither RFR nor L_AP can restore lost detail. A quantitative ablation (e.g., frequency-domain energy comparison or LPIPS/ perceptual metrics before/after F16 encoding) is required in the experiments section to support the claim.

    Authors: We agree that a direct test of high-frequency preservation under F16 compression is valuable for substantiating the central claim. In the revised manuscript, we have added a new ablation study in Section 4 that includes frequency-domain energy spectrum comparisons (via FFT magnitude analysis) and LPIPS/perceptual metric evaluations on images before and after F16 VAE encoding/decoding. These results show that while some high-frequency energy is attenuated, the global context modeling in the one-step Flux backbone combined with RFR and L_AP enables recovery of perceptually relevant details sufficient to match patch-based baselines. We have also clarified in §3 how the periodicity suppression mechanisms mitigate the impact of any residual compression artifacts. revision: yes

  2. Referee: [Abstract and §4 (experiments)] The abstract states 'extensive experiments demonstrate competitive perceptual quality' but supplies no numerical values, error bars, or table references. Without these, the reader cannot assess whether the 5.75 s runtime trades off sharpness for coherence. The experiments section must include direct side-by-side metrics (PSNR, LPIPS, NIQE, user study) against at least two recent patch-based 4K baselines.

    Authors: We concur that explicit numerical results and direct comparisons are necessary for readers to evaluate the efficiency-quality tradeoff. In the revised version, we have updated the abstract to reference key metrics (e.g., LPIPS and NIQE values with comparisons) and expanded Section 4 with a new table providing side-by-side results against two recent patch-based 4K Real-ISR baselines. The table includes PSNR, LPIPS, NIQE, and user study scores (with standard deviations), demonstrating that OP4KSR achieves competitive perceptual quality at substantially lower latency. Error bars are reported for all metrics where applicable. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical engineering solution for one-step 4K SR using the Flux backbone, F16 VAE compression, RoPE rescaling (RFR), and an autocorrelation loss (L_AP) to suppress observed periodic artifacts. These components are introduced as targeted fixes for memory limits and artifact patterns rather than predictions or results derived from the method itself. No equations reduce by construction to inputs, no fitted parameters are relabeled as predictions, and no load-bearing self-citations or uniqueness theorems are invoked. The derivation chain remains self-contained with external experimental validation on curated datasets.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The approach rests on the unstated premise that Flux can be adapted to one-step 4K inference via VAE compression without loss of fidelity and that the proposed artifact fixes are sufficient; no explicit free parameters or invented entities are named in the abstract.

pith-pipeline@v0.9.0 · 5562 in / 1051 out tokens · 40719 ms · 2026-05-14T20:32:14.838987+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

77 extracted references · 18 canonical work pages · 6 internal anchors

  1. [1]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops

    Agustsson, E., Timofte, R.: Ntire 2017 challenge on single image super-resolution: Dataset and study. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. pp. 126–135 (2017)

  2. [2]

    Advances in Neural Information Processing Systems37, 55443–55469 (2024)

    Ai, Y., Zhou, X., Huang, H., Han, X., Chen, Z., You, Q., Yang, H.: Dreamclear: High- capacity real-world image restoration with privacy-safe dataset curation. Advances in Neural Information Processing Systems37, 55443–55469 (2024)

  3. [3]

    Qwen3-VL Technical Report

    Bai, S., Cai, Y., Chen, R., Chen, K., Chen, X., Cheng, Z., Deng, L., Ding, W., Gao, C., Ge, C., et al.: Qwen3-vl technical report. arXiv preprint arXiv:2511.21631 (2025)

  4. [4]

    Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

    Blattmann, A., Dockhorn, T., Kulal, S., Mendelevitch, D., Kilian, M., Lorenz, D., Levi, Y., English, Z., Voleti, V., Letts, A., et al.: Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127 (2023)

  5. [5]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Chen, B., Li, G., Wu, R., Zhang, X., Chen, J., Zhang, J., Zhang, L.: Adversarial diffusion compression for real-world image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 28208–28220 (2025)

  6. [6]

    IEEE Transactions on Image Processing33, 2404–2418 (2024)

    Chen, C., Mo, J., Hou, J., Wu, H., Liao, L., Sun, W., Yan, Q., Lin, W.: Topiq: A top-down approach from semantics to distortions for image quality assessment. IEEE Transactions on Image Processing33, 2404–2418 (2024)

  7. [7]

    arXiv preprint arXiv:2512.14061 (2025)

    Chen, H., Chen, J., Pan, J., Dong, J.: Bridging fidelity-reality with controllable one-step diffusion for image super-resolution. arXiv preprint arXiv:2512.14061 (2025)

  8. [8]

    In: European Conference on Computer Vision

    Chen, J., Ge, C., Xie, E., Wu, Y., Yao, L., Ren, X., Wang, Z., Luo, P., Lu, H., Li, Z.: Pixart-σ: Weak-to-strong training of diffusion transformer for 4k text-to-image generation. In: European Conference on Computer Vision. pp. 74–91. Springer (2024)

  9. [9]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Chen, J., Pan, J., Dong, J.: Faithdiff: Unleashing diffusion priors for faithful image super-resolution. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 28188–28197 (2025)

  10. [10]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Dai, T., Cai, J., Zhang, Y., Xia, S.T., Zhang, L.: Second-order attention network for single image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11065–11074 (2019)

  11. [11]

    arXiv preprint arXiv:2602.24240 (2026)

    Deng, C., Chen, Z., Yu, L., Zhang, K., Zhou, X., Zhang, W.: Joint geometric and trajectory consistency learning for one-step real-world super-resolution. arXiv preprint arXiv:2602.24240 (2026)

  12. [12]

    Pattern Recognition175, 113057 (2026)

    Deng, C., Zhang, K., Yang, L., Zhang, W., Yu, L.: Ihmambasr: An importance- guided hierarchical mamba with dynamic prompt for single image super-resolution. Pattern Recognition175, 113057 (2026)

  13. [13]

    IEEE transactions on pattern analysis and machine intelligence44(5), 2567–2581 (2020)

    Ding, K., Ma, K., Wang, S., Simoncelli, E.P.: Image quality assessment: Unifying structure and texture similarity. IEEE transactions on pattern analysis and machine intelligence44(5), 2567–2581 (2020)

  14. [14]

    IEEE transactions on pattern analysis and machine intelligence 38(2), 295–307 (2015)

    Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolu- tional networks. IEEE transactions on pattern analysis and machine intelligence 38(2), 295–307 (2015)

  15. [15]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Dong, L., Fan, Q., Guo, Y., Wang, Z., Zhang, Q., Chen, J., Luo, Y., Zou, C.: Tsd-sr: One-step diffusion with target score distillation for real-world image super-resolution. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 23174–23184 (2025) OP4KSR 17

  16. [16]

    TinySR: Pruning Diffusion for Real-World Image Super-Resolution

    Dong, L., Fan, Q., Yu, Y., Zhang, Q., Chen, J., Luo, Y., Zou, C.: Tinysr: Pruning diffusion for real-world image super-resolution. arXiv preprint arXiv:2508.17434 (2025)

  17. [17]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Duan, Z.P., Zhang, J., Jin, X., Zhang, Z., Xiong, Z., Zou, D., Ren, J.S., Guo, C., Li, C.: Dit4sr: Taming diffusion transformer for real-world image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 18948–18958 (2025)

  18. [18]

    LucidFlux: Caption-Free Photo-Realistic Image Restoration via a Large-Scale Diffusion Transformer

    Fei, S., Ye, T., Wang, L., Zhu, L.: Lucidflux: Caption-free universal image restoration via a large-scale diffusion transformer. arXiv preprint arXiv:2509.22414 (2025)

  19. [19]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Gankhuyag, G., Yoon, K., Park, J., Son, H.S., Min, K.: Lightweight real-time image super-resolution network for 4k images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1746–1755 (2023)

  20. [20]

    In: European conference on computer vision

    Guo, H., Li, J., Dai, T., Ouyang, Z., Ren, X., Xia, S.T.: Mambair: A simple baseline for image restoration with state-space model. In: European conference on computer vision. pp. 222–241. Springer (2024)

  21. [21]

    Springer (2005)

    Jähne, B.: Digital image processing. Springer (2005)

  22. [22]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Ke, J., Wang, Q., Wang, Y., Milanfar, P., Yang, F.: Musiq: Multi-scale image quality transformer. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 5148–5157 (2021)

  23. [23]

    Labs, B.F.: Flux.https://github.com/black-forest-labs/flux(2023)

  24. [24]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4681–4690 (2017)

  25. [25]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Li, B., Zhao, H., Wang, W., Hu, P., Gou, Y., Peng, X.: Mair: A locality-and continuity-preserving mamba for image restoration. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 7491–7501 (2025)

  26. [26]

    arXiv preprint arXiv:2502.01993 (2025)

    Li, J., Cao, J., Guo, Y., Li, W., Zhang, Y.: One diffusion step to real-world super- resolution via flow trajectory distillation. arXiv preprint arXiv:2502.01993 (2025)

  27. [27]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Li, Y., Zhang, K., Liang, J., Cao, J., Liu, C., Gong, R., Zhang, Y., Tang, H., Liu, Y., Demandolx, D., et al.: Lsdir: A large scale dataset for image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1775–1787 (2023)

  28. [28]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: Image restoration using swin transformer. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 1833–1844 (2021)

  29. [30]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops

    Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. pp. 136–144 (2017)

  30. [31]

    In: European conference on computer vision

    Lin, X., He, J., Chen, Z., Lyu, Z., Dai, B., Yu, F., Qiao, Y., Ouyang, W., Dong, C.: Diffbir: Toward blind image restoration with generative diffusion prior. In: European conference on computer vision. pp. 430–448. Springer (2024)

  31. [32]

    ACM Transactions on Graphics (TOG)44(6), 1–21 (2025)

    Lin, X., Yu, F., Hu, J., You, Z., Shi, W., Ren, J.S., Gu, J., Dong, C.: Harnessing diffusion-yielded score priors for image restoration. ACM Transactions on Graphics (TOG)44(6), 1–21 (2025)

  32. [33]

    Flow Matching for Generative Modeling

    Lipman, Y., Chen, R.T., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. arXiv preprint arXiv:2210.02747 (2022) 18 C. Deng et al

  33. [34]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Long, W., Zhou, X., Zhang, L., Gu, S.: Progressive focused transformer for sin- gle image super-resolution. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 2279–2288 (2025)

  34. [35]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Mei, Y., Fan, Y., Zhou, Y.: Image super-resolution with non-local sparse attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3517–3526 (2021)

  35. [36]

    In: European conference on computer vision

    Mou, C., Wu, Y., Wang, X., Dong, C., Zhang, J., Shan, Y.: Metric learning based interactive modulation for real-world super-resolution. In: European conference on computer vision. pp. 723–740. Springer (2022)

  36. [37]

    arXiv preprint arXiv:2501.16583 (2025)

    Peng, L., Di, X., Feng, Z., Li, W., Pei, R., Wang, Y., Fu, X., Cao, Y., Zha, Z.J.: Directing mamba to complex textures: An efficient texture-aware state space model for image restoration. arXiv preprint arXiv:2501.16583 (2025)

  37. [38]

    SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

    Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., Rombach, R.: Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952 (2023)

  38. [39]

    Advances in neural information processing systems35, 25278–25294 (2022)

    Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C., Wightman, R., Cherti, M., Coombes, T., Katta, A., Mullis, C., Wortsman, M., et al.: Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in neural information processing systems35, 25278–25294 (2022)

  39. [40]

    arXiv preprint arXiv:2510.03012 (2025)

    Sun, H., Jiang, L., Li, F., Pei, R., Wang, Z., Guo, Y., Xu, J., Chen, H., Han, J., Song, F., et al.: Pocketsr: The super-resolution expert in your pocket mobiles. arXiv preprint arXiv:2510.03012 (2025)

  40. [41]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Sun, L., Wu, R., Ma, Z., Liu, S., Yi, Q., Zhang, L.: Pixel-level and semantic-level adjustable super-resolution: A dual-lora approach. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 2333–2343 (2025)

  41. [42]

    Pattern Recognition p

    Tai, Y., Xie, R., Zhao, C., Zhang, K., Zhang, Z., Zhou, J., Yang, J.: Addsr: Acceler- ating diffusion-based blind super-resolution with adversarial diffusion distillation. Pattern Recognition p. 113012 (2026)

  42. [43]

    IEEE transactions on image processing27(8), 3998–4011 (2018)

    Talebi, H., Milanfar, P.: Nima: Neural image assessment. IEEE transactions on image processing27(8), 3998–4011 (2018)

  43. [44]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    Wang, J., Chan, K.C., Loy, C.C.: Exploring clip for assessing the look and feel of images. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 37, pp. 2555–2563 (2023)

  44. [45]

    International Journal of Computer Vision 132(12), 5929–5949 (2024)

    Wang, J., Yue, Z., Zhou, S., Chan, K.C., Loy, C.C.: Exploiting diffusion prior for real-world image super-resolution. International Journal of Computer Vision 132(12), 5929–5949 (2024)

  45. [46]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Wang, X., Xie, L., Dong, C., Shan, Y.: Real-esrgan: Training real-world blind super- resolution with pure synthetic data. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 1905–1914 (2021)

  46. [47]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Wang, Y., Yang, W., Chen, X., Wang, Y., Guo, L., Chau, L.P., Liu, Z., Qiao, Y., Kot, A.C., Wen, B.: Sinsr: diffusion-based image super-resolution in a single step. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 25796–25805 (2024)

  47. [48]

    Advances in neural information processing systems36, 8406–8441 (2023)

    Wang, Z., Lu, C., Wang, Y., Bao, F., Li, C., Su, H., Zhu, J.: Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. Advances in neural information processing systems36, 8406–8441 (2023)

  48. [49]

    IEEE transactions on image processing 13(4), 600–612 (2004) OP4KSR 19

    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13(4), 600–612 (2004) OP4KSR 19

  49. [50]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Wei, Y., Gu, S., Li, Y., Timofte, R., Jin, L., Song, H.: Unsupervised real-world image super resolution via domain-distance aware training. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 13385–13394 (2021)

  50. [51]

    Advances in Neural Information Processing Systems 37, 92529–92553 (2024)

    Wu, R., Sun, L., Ma, Z., Zhang, L.: One-step effective diffusion network for real- world image super-resolution. Advances in Neural Information Processing Systems 37, 92529–92553 (2024)

  51. [52]

    In: Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition

    Wu, R., Yang, T., Sun, L., Zhang, Z., Li, S., Zhang, L.: Seesr: Towards semantics- aware real-world image super-resolution. In: Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition. pp. 25456–25467 (2024)

  52. [53]

    arXiv preprint arXiv:2508.08227 (2025)

    Wu, Z., Sun, Z., Zhou, T., Fu, B., Cong, J., Dong, Y., Zhang, H., Tang, X., Chen, M., Wei, X.: Omgsr: You only need one mid-timestep guidance for real-world image super-resolution. arXiv preprint arXiv:2508.08227 (2025)

  53. [54]

    arXiv preprint arXiv:2509.10122 (2025)

    Wu, Z., Zheng, S., Jiang, P.T., Yuan, X.: Realism control one-step diffusion for real-world image super-resolution. arXiv preprint arXiv:2509.10122 (2025)

  54. [55]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition

    Yang, S., Wu, T., Shi, S., Lao, S., Gong, Y., Cao, M., Wang, J., Yang, Y.: Maniqa: Multi-dimension attention network for no-reference image quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition. pp. 1191–1200 (2022)

  55. [56]

    In: European conference on computer vision

    Yang,T.,Wu,R.,Ren,P.,Xie,X.,Zhang,L.:Pixel-awarestablediffusionforrealistic image super-resolution and personalized stylization. In: European conference on computer vision. pp. 74–91. Springer (2024)

  56. [57]

    arXiv preprint arXiv:2511.18050 (2025)

    Ye, T., Fei, S., Zhu, L.: Ultraflux: Data-model co-design for high-quality na- tive 4k text-to-image generation across diverse aspect ratios. arXiv preprint arXiv:2511.18050 (2025)

  57. [58]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Yi, Q., Li, S., Wu, R., Sun, L., Wu, Y., Zhang, L.: Fine-structure preserved real-world image super-resolution via transfer vae training. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 12415–12426 (2025)

  58. [59]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Yoon, K., Gankhuyag, G., Park, J., Son, H., Min, K.: Casr: Efficient cascade network structure with channel aligned method for 4k real-time single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 7911–7920 (2024)

  59. [60]

    arXiv preprint arXiv:2503.20349 (2025)

    You, W., Zhang, M., Zhang, L., Zhou, X., Shi, K., Gu, S.: Consistency trajectory matching for one-step generative super-resolution. arXiv preprint arXiv:2503.20349 (2025)

  60. [61]

    ACM Trans

    Yu, D., Min, W., Jin, X., Jiang, Q., Jin, Y., Jiang, S.: Diverse and high-quality food image generation from only food names. ACM Trans. Multimedia Comput. Commun. Appl.21(5) (May 2025)

  61. [62]

    IEEE Transactions on Image Processing34, 7290–7304 (2025)

    Yu, D., Min, W., Jin, X., Jiang, Q., Yao, S., Jiang, S.: Food3d: Text-driven customizable 3d food generation with gaussian splatting. IEEE Transactions on Image Processing34, 7290–7304 (2025)

  62. [63]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Yu, F., Gu, J., Li, Z., Hu, J., Kong, X., Wang, X., He, J., Qiao, Y., Dong, C.: Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 25669–25680 (2024)

  63. [64]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Yue, Z., Liao, K., Loy, C.C.: Arbitrary-steps image super-resolution via diffu- sion inversion. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 23153–23163 (2025)

  64. [65]

    Advances in Neural Information Processing Systems 36, 13294–13307 (2023) 20 C

    Yue, Z., Wang, J., Loy, C.C.: Resshift: Efficient diffusion model for image super- resolution by residual shifting. Advances in Neural Information Processing Systems 36, 13294–13307 (2023) 20 C. Deng et al

  65. [66]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Zamfir, E., Conde, M.V., Timofte, R.: Towards real-time 4k image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1522–1532 (2023)

  66. [67]

    Degradation-guided one-step im- age super-resolution with diffusion priors.arXiv preprint arXiv:2409.17058, 2024

    Zhang, A., Yue, Z., Pei, R., Ren, W., Cao, X.: Degradation-guided one-step image super-resolution with diffusion priors. arXiv preprint arXiv:2409.17058 (2024)

  67. [68]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Zhang,J.,Huang,Q.,Liu,J.,Guo,X.,Huang,D.:Diffusion-4k:Ultra-high-resolution image synthesis with latent diffusion models. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 23464–23473 (2025)

  68. [69]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Zhang, K., Liang, J., Van Gool, L., Timofte, R.: Designing a practical degradation model for deep blind image super-resolution. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4791–4800 (2021)

  69. [70]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Zhang, L., Li, Y., Zhou, X., Zhao, X., Gu, S.: Transcending the limit of local window: Advanced super-resolution transformer with adaptive token dictionary. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2856–2865 (2024)

  70. [71]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Zhang, L., You, W., Shi, K., Gu, S.: Uncertainty-guided perturbation for image super-resolution diffusion model. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 17980–17989 (2025)

  71. [72]

    IEEE Transactions on Image Processing24(8), 2579–2591 (2015)

    Zhang, L., Zhang, L., Bovik, A.C.: A feature-enriched completely blind image quality evaluator. IEEE Transactions on Image Processing24(8), 2579–2591 (2015)

  72. [73]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 3836–3847 (2023)

  73. [74]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)

  74. [75]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Zhou, Y., Li, Z., Guo, C.L., Bai, S., Cheng, M.M., Hou, Q.: Srformer: Permuted self-attention for single image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 12780–12791 (2023)

  75. [76]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Zhu,L.,Li,J.,Qin,H.,Li,W.,Zhang,Y.,Guo,Y.,Yang,X.:Passionsr:Post-training quantization with adaptive scale in one-step diffusion based image super-resolution. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 12778–12788 (2025)

  76. [77]

    arXiv preprint arXiv:2412.09465 (2024)

    Zhu, Y., Wang, R., Lu, S., Li, J., Yan, H., Zhang, K.: Oftsr: One-step flow for image super-resolution with tunable fidelity-realism trade-offs. arXiv preprint arXiv:2412.09465 (2024)

  77. [78]

    arXiv preprint arXiv:2507.07105 (2025)

    Zuo, Y., Zheng, Q., Wu, M., Jiang, X., Li, R., Wang, J., Zhang, Y., Mai, G., Wang, L.V., Zou, J., et al.: 4kagent: agentic any image to 4k super-resolution. arXiv preprint arXiv:2507.07105 (2025)