Image Enhancement by Recurrently-trained Super-resolution Network

Nojun Kwak; Saem Park

arxiv: 1907.11341 · v1 · pith:OYJB4RSYnew · submitted 2019-07-26 · 📡 eess.IV · cs.CV· cs.LG

Image Enhancement by Recurrently-trained Super-resolution Network

Saem Park , Nojun Kwak This is my paper

Pith reviewed 2026-05-24 15:41 UTC · model grok-4.3

classification 📡 eess.IV cs.CVcs.LG

keywords super-resolutionimage enhancementrecurrent trainingVIQET MOSefficient networktarget generation

0 comments

The pith

Recurrently training a super-resolution network on targets it generates itself produces higher-quality images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a recurrent training strategy for super-resolution networks. An SR network is first trained on corrupted low-resolution images paired with original high-resolution ones. The trained network then generates new high-resolution images from uncorrupted originals; these are downscaled and used as improved targets for the next training round. The process repeats, and the authors claim that image quality improves with each iteration up to a limit, yielding a more efficient network. Readers would care because the approach aims to boost performance using the same simple network architecture without extra data or complexity.

Core claim

After initial training on corrupted LR to original pairs, the SR network generates new HR images from uncorrupted inputs; downscaling those outputs creates new targets that, when used for repeated training rounds, produce better image quality up to a certain point and enable a more efficient SR network.

What carries the argument

The recurrent training loop that applies the SR network to uncorrupted images, downscales the outputs, and feeds those as new targets for the subsequent training stage.

If this is right

Repeating the training process multiple times yields progressively better images up to a limit.
The same simple network becomes more efficient and can be downsized while retaining performance.
The recurrent strategy offers a route to image enhancement that avoids larger convolution networks.
VIQET MOS provides a human-visual-quality measure superior to MSE for tracking the gains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method shares traits with self-supervised bootstrapping, where models refine targets from their own predictions.
Similar recurrent target generation could be tested on related tasks such as denoising or deblurring.
Improvement may plateau or reverse after a few rounds if small errors accumulate in the generated targets.
Smaller networks trained this way might lower inference cost in resource-constrained settings.

Load-bearing premise

Downscaled outputs produced by the trained SR network from uncorrupted images act as higher-quality targets that improve results without adding bias or harming generalization.

What would settle it

Running the recurrent process for several iterations and observing that VIQET MOS scores stop rising or begin to fall would show the claim of progressive improvement does not hold.

Figures

Figures reproduced from arXiv: 1907.11341 by Nojun Kwak, Saem Park.

**Figure 2.** Figure 2: Recurrent Training Strategy(RTS). LR is a low resolution image obtained by down-scaling the original image (HR0). One stage is composed of the SR training phase (Phase A) and the image enhancement phase (Phase B). By successive application of these two phases, we can obtain a better SR network and enhanced images. Although it takes multiple times to learn, at inference time, it is possible to obtain ima… view at source ↗

**Figure 3.** Figure 3: Used network for recurrent learning. This network was developed exclusively for recursive learning systems. The biggest difference from the existing SR network is that it was designed to produce the best result at the blue layer where the residual is added with only 3 channels. The model is also very light and easy to control the number of layers. SR(D(y)) = U(D(y)) + R(D(y)). (3) Here, we can suppose D(U(… view at source ↗

**Figure 4.** Figure 4: Change of image difference vs. stage. It can be seen that the difference converges to almost zero in HR4 (Yellow Line). But with repeated application of unsharp masking, we can observe the delta increases linearly (Green Line). HRN and the corresponding network SRN can be used as the final super resolution network. The important thing here is how to judge whether the output image gets better or not. As a r… view at source ↗

**Figure 5.** Figure 5: Comparison of output images in each stage. For better comparison, each row shows a part (64 × 64 pixels) of an image in DIV2K image set. As described in the introduction, the originals are naturally blurred in the capturing and the transmission process. The stage number increases from left to right. As the stage increases, it can be observed that the texture is clearly seen and the result becomes clearer. … view at source ↗

**Figure 6.** Figure 6: The MOS results for each stage. The final score, MOS (Red line), peaked at HR3. The increase in MOS is similar to the convergence of the difference, yellow line in Fig.4. Also the specific measures in other colors tend to be increasing until HR3 and then they saturate or decrease slightly. 4.3. The result images [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

We introduce a new learning strategy for image enhancement by recurrently training the same simple superresolution (SR) network multiple times. After initially training an SR network by using pairs of a corrupted low resolution (LR) image and an original image, the proposed method makes use of the trained SR network to generate new high resolution (HR) images with a doubled resolution from the original uncorrupted images. Then, the new HR images are downscaled to the original resolution, which work as target images for the SR network in the next stage. The newly generated HR images by the repeatedly trained SR network show better image quality and this strategy of training LR to mimic new HR can lead to a more efficient SR network. Up to a certain point, by repeating this process multiple times, better and better images are obtained. This recurrent leaning strategy for SR can be a good solution for downsizing convolution networks and making a more efficient SR network. To measure the enhanced image quality, for the first time in this area of super-resolution and image enhancement, we use VIQET MOS score which reflects human visual quality more accurately than the conventional MSE measure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes a recurrent training strategy for super-resolution (SR) networks: an SR network is first trained on pairs of corrupted LR images and original HR images; the trained network then generates new HR images from uncorrupted originals, which are downscaled to serve as targets for the next training iteration. The authors claim that repeating this process produces progressively better image quality up to a point, yields a more efficient SR network, and demonstrate the approach using the VIQET MOS score for evaluation.

Significance. If the recurrent targets are verifiably higher-quality and the observed gains are not artifacts of the loop, the method could offer a lightweight way to improve SR performance without enlarging the network architecture. The use of VIQET MOS is a positive step toward perceptual evaluation, but the absence of supporting experiments makes the practical significance difficult to assess at present.

major comments (3)

[Abstract] Abstract: the central claim that 'by repeating this process multiple times, better and better images are obtained' is unsupported by any per-iteration quantitative metrics (VIQET, PSNR, or otherwise), ablation isolating the recurrent target generation from simply training longer on the original pairs, or comparison against standard SR baselines. This evidence is load-bearing for the claim that the recurrent loop itself improves quality.
[Method] Method description (recurrent stage procedure): the assumption that downscaled SR outputs on uncorrupted images constitute higher-quality targets is not validated; no experiment compares the quality or bias of these generated targets against the original HR images, leaving open the risk that initial hallucinations become self-reinforcing.
[Experiments / Evaluation] Evaluation: although VIQET MOS is introduced, the manuscript provides no numerical results, tables, or figures showing scores across recurrent stages or against baselines, so the reported improvement cannot be verified or reproduced.

minor comments (2)

[Method] The number of recurrent stages is listed as a free parameter; clarify how it is chosen in practice and whether an early-stopping criterion based on target quality is used.
[Method] Clarify the precise downscaling operator applied to the generated HR images and whether it matches the corruption model used in the initial training pairs.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We agree that the current version of the manuscript requires additional quantitative evidence to support the central claims regarding the benefits of recurrent training. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'by repeating this process multiple times, better and better images are obtained' is unsupported by any per-iteration quantitative metrics (VIQET, PSNR, or otherwise), ablation isolating the recurrent target generation from simply training longer on the original pairs, or comparison against standard SR baselines. This evidence is load-bearing for the claim that the recurrent loop itself improves quality.

Authors: We acknowledge that the abstract's claim regarding progressive improvement through recurrent training is not supported by per-iteration metrics or ablations in the current manuscript. In the revised version, we will add tables and figures with VIQET MOS and PSNR scores across recurrent stages, an ablation isolating the recurrent target generation from extended training on the original pairs, and comparisons to standard SR baselines to substantiate the claim that the recurrent loop itself drives the quality gains. revision: yes
Referee: [Method] Method description (recurrent stage procedure): the assumption that downscaled SR outputs on uncorrupted images constitute higher-quality targets is not validated; no experiment compares the quality or bias of these generated targets against the original HR images, leaving open the risk that initial hallucinations become self-reinforcing.

Authors: The method assumes that SR outputs on uncorrupted images provide higher-quality targets after initial training, but we agree this assumption lacks direct validation against original HR images. We will add an experiment in the revision that compares the quality and potential bias of the generated targets to the original HR images using perceptual metrics and analysis to address the risk of self-reinforcing hallucinations. revision: yes
Referee: [Experiments / Evaluation] Evaluation: although VIQET MOS is introduced, the manuscript provides no numerical results, tables, or figures showing scores across recurrent stages or against baselines, so the reported improvement cannot be verified or reproduced.

Authors: We recognize that the absence of numerical VIQET MOS results, tables, and figures prevents verification and reproducibility. The revised manuscript will include explicit numerical results, tables, and figures showing VIQET MOS scores across recurrent stages and against baselines to enable verification of the reported improvements. revision: yes

Circularity Check

0 steps flagged

No circularity: iterative training uses external data and standard optimization

full rationale

The paper describes a recurrent training loop for an SR network: initial training on corrupted LR-original pairs, followed by generating new HR images from uncorrupted inputs, downscaling them as new targets, and repeating. This procedure relies on external image data and conventional supervised optimization at each step rather than reducing to a self-referential equation, fitted parameter renamed as prediction, or self-citation chain. No equations, uniqueness theorems, or ansatzes are presented that collapse the claimed improvement to the inputs by construction. The method remains falsifiable against external benchmarks and does not invoke load-bearing self-citations.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The claim depends on the empirical success of the iterative target refinement process, with the repetition count as the main tunable element.

free parameters (1)

number of recurrent stages
The number of times the training process is repeated is a hyperparameter chosen to balance improvement and potential degradation.

axioms (1)

domain assumption The super-resolution network can be trained to convergence using standard backpropagation on image pairs
The method assumes the initial and subsequent trainings succeed in learning useful mappings.

pith-pipeline@v0.9.0 · 5728 in / 1199 out tokens · 32121 ms · 2026-05-24T15:41:44.478461+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 1 internal anchor

[1]

Agustsson and R

E. Agustsson and R. Timofte. Ntire 2017 challenge on sin- gle image super-resolution: Dataset and study. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017

work page 2017
[2]

Buades, B

A. Buades, B. Coll, and J. . Morel. A non-local algo- rithm for image denoising. In 2005 IEEE Computer Soci- ety Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 2, pages 60–65 vol. 2, June 2005

work page 2005
[3]

X. Chen, Y . Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. Infogan: Interpretable representation learn- ing by information maximizing generative adversarial nets. In D. D. Lee, M. Sugiyama, U. V . Luxburg, I. Guyon, and R. Garnett, editors,Advances in Neural Information Process- ing Systems 29 , pages 2172–2180. Curran Associates, Inc., 2016

work page 2016
[4]

R. Dahl, M. Norouzi, and J. Shlens. Pixel recursive super resolution. In The IEEE International Conference on Com- puter Vision (ICCV), Oct 2017

work page 2017
[5]

C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep convolutional network for image super-resolution. In D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors, Computer Vision – ECCV 2014 , pages 184–199, Cham,

work page 2014
[6]

Springer International Publishing

work page
[7]

Glasner, S

D. Glasner, S. Bagon, and M. Irani. Super-resolution from a single image. In 2009 IEEE 12th International Conference on Computer Vision (ICCV), pages 349–356, Los Alamitos, CA, USA, oct 2009. IEEE Computer Society

work page 2009
[8]

J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image super- resolution using very deep convolutional networks. In The IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), June 2016

work page 2016
[9]

J. Kim, J. Kwon Lee, and K. Mu Lee. Deeply-recursive convolutional network for image super-resolution. In The IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), June 2016

work page 2016
[10]

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

C. Ledig, L. Theis, F. Huszar, J. Caballero, A. P. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi. Photo-realistic single image super-resolution using a generative adversarial network. CoRR, abs/1609.04802, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[11]

Ledig, L

C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunning- ham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi. Photo-realistic single image super-resolution using a generative adversarial network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , July 2017

work page 2017
[12]

B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee. Enhanced deep residual networks for single image super-resolution. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017

work page 2017
[13]

M. Mrak, S. Grgic, and M. Grgic. Picture quality measures in image compression systems. In The IEEE Region 8 EURO- CON 2003. Computer as a Tool. , volume 1, pages 233–236 vol.1, Sep. 2003

work page 2003
[14]

Polesel, G

A. Polesel, G. Ramponi, and V . J. Mathews. Image enhance- ment via adaptive unsharp masking. IEEE Transactions on Image Processing, 9(3):505–510, March 2000

work page 2000
[15]

W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang. Real-time single im- age and video super-resolution using an efﬁcient sub-pixel convolutional neural network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2016

work page 2016
[16]

Y . Tai, J. Yang, and X. Liu. Image super-resolution via deep recursive residual network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , July 2017

work page 2017
[17]

Y . Tai, J. Yang, X. Liu, and C. Xu. Memnet: A persistent memory network for image restoration. In The IEEE Inter- national Conference on Computer Vision (ICCV), Oct 2017

work page 2017
[18]

T. Tong, G. Li, X. Liu, and Q. Gao. Image super-resolution using dense skip connections. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017

work page 2017
[19]

V . Q. E. G. (VQEG). Vqeg image quality evaluation tool (viqet) version 2.3.117.87, 2016

work page 2016
[20]

J. Yang, J. Wright, T. S. Huang, and Y . Ma. Image super- resolution via sparse representation. IEEE Transactions on Image Processing, 19(11):2861–2873, Nov 2010

work page 2010
[21]

Yoo, S.-h

J. Yoo, S.-h. Lee, and N. Kwak. Image restoration by es- timating frequency distribution of local patches. In The IEEE Conference on Computer Vision and Pattern Recog- nition (CVPR), June 2018

work page 2018
[22]

Zhang and J

B. Zhang and J. P. Allebach. Adaptive bilateral ﬁlter for sharpness enhancement and noise removal. IEEE Transac- tions on Image Processing, 17(5):664–678, May 2008

work page 2008

[1] [1]

Agustsson and R

E. Agustsson and R. Timofte. Ntire 2017 challenge on sin- gle image super-resolution: Dataset and study. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017

work page 2017

[2] [2]

Buades, B

A. Buades, B. Coll, and J. . Morel. A non-local algo- rithm for image denoising. In 2005 IEEE Computer Soci- ety Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 2, pages 60–65 vol. 2, June 2005

work page 2005

[3] [3]

X. Chen, Y . Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. Infogan: Interpretable representation learn- ing by information maximizing generative adversarial nets. In D. D. Lee, M. Sugiyama, U. V . Luxburg, I. Guyon, and R. Garnett, editors,Advances in Neural Information Process- ing Systems 29 , pages 2172–2180. Curran Associates, Inc., 2016

work page 2016

[4] [4]

R. Dahl, M. Norouzi, and J. Shlens. Pixel recursive super resolution. In The IEEE International Conference on Com- puter Vision (ICCV), Oct 2017

work page 2017

[5] [5]

C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep convolutional network for image super-resolution. In D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors, Computer Vision – ECCV 2014 , pages 184–199, Cham,

work page 2014

[6] [6]

Springer International Publishing

work page

[7] [7]

Glasner, S

D. Glasner, S. Bagon, and M. Irani. Super-resolution from a single image. In 2009 IEEE 12th International Conference on Computer Vision (ICCV), pages 349–356, Los Alamitos, CA, USA, oct 2009. IEEE Computer Society

work page 2009

[8] [8]

J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image super- resolution using very deep convolutional networks. In The IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), June 2016

work page 2016

[9] [9]

J. Kim, J. Kwon Lee, and K. Mu Lee. Deeply-recursive convolutional network for image super-resolution. In The IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), June 2016

work page 2016

[10] [10]

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

C. Ledig, L. Theis, F. Huszar, J. Caballero, A. P. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi. Photo-realistic single image super-resolution using a generative adversarial network. CoRR, abs/1609.04802, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[11] [11]

Ledig, L

C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunning- ham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi. Photo-realistic single image super-resolution using a generative adversarial network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , July 2017

work page 2017

[12] [12]

B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee. Enhanced deep residual networks for single image super-resolution. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017

work page 2017

[13] [13]

M. Mrak, S. Grgic, and M. Grgic. Picture quality measures in image compression systems. In The IEEE Region 8 EURO- CON 2003. Computer as a Tool. , volume 1, pages 233–236 vol.1, Sep. 2003

work page 2003

[14] [14]

Polesel, G

A. Polesel, G. Ramponi, and V . J. Mathews. Image enhance- ment via adaptive unsharp masking. IEEE Transactions on Image Processing, 9(3):505–510, March 2000

work page 2000

[15] [15]

W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang. Real-time single im- age and video super-resolution using an efﬁcient sub-pixel convolutional neural network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2016

work page 2016

[16] [16]

Y . Tai, J. Yang, and X. Liu. Image super-resolution via deep recursive residual network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , July 2017

work page 2017

[17] [17]

Y . Tai, J. Yang, X. Liu, and C. Xu. Memnet: A persistent memory network for image restoration. In The IEEE Inter- national Conference on Computer Vision (ICCV), Oct 2017

work page 2017

[18] [18]

T. Tong, G. Li, X. Liu, and Q. Gao. Image super-resolution using dense skip connections. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017

work page 2017

[19] [19]

V . Q. E. G. (VQEG). Vqeg image quality evaluation tool (viqet) version 2.3.117.87, 2016

work page 2016

[20] [20]

J. Yang, J. Wright, T. S. Huang, and Y . Ma. Image super- resolution via sparse representation. IEEE Transactions on Image Processing, 19(11):2861–2873, Nov 2010

work page 2010

[21] [21]

Yoo, S.-h

J. Yoo, S.-h. Lee, and N. Kwak. Image restoration by es- timating frequency distribution of local patches. In The IEEE Conference on Computer Vision and Pattern Recog- nition (CVPR), June 2018

work page 2018

[22] [22]

Zhang and J

B. Zhang and J. P. Allebach. Adaptive bilateral ﬁlter for sharpness enhancement and noise removal. IEEE Transac- tions on Image Processing, 17(5):664–678, May 2008

work page 2008