Multi-level Wavelet Convolutional Neural Networks

Hongzhi Zhang; Pengju Liu; Wangmeng Zuo; Wei Lian

arxiv: 1907.03128 · v1 · pith:YOHXSMQ4new · submitted 2019-07-06 · 💻 cs.CV · eess.IV

Multi-level Wavelet Convolutional Neural Networks

Pengju Liu , Hongzhi Zhang , Wei Lian , Wangmeng Zuo This is my paper

Pith reviewed 2026-05-25 01:33 UTC · model grok-4.3

classification 💻 cs.CV eess.IV

keywords wavelet transformconvolutional neural networkimage restorationreceptive fieldU-Netdenoisingsuper-resolutionpooling

0 comments

The pith

Multi-level wavelet transforms embedded in CNNs allow larger receptive fields with less information loss than pooling or dilated convolutions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a multi-level wavelet CNN that inserts wavelet decomposition into the network to downsample features while expanding the receptive field. This is intended to overcome the information loss from pooling and the gridding artifacts from dilated filters. The architecture uses a U-Net base with inverse wavelet transforms to rebuild high-resolution maps for image restoration tasks. A sympathetic reader would care because it promises a more efficient way to handle large context in convolutional networks for vision problems. The model is also positioned as a generalization of average pooling applicable to classification.

Core claim

By embedding the wavelet transform into the CNN, the MWCNN reduces the resolution of feature maps to increase the receptive field size while preserving information better than pooling, and uses inverse wavelet transform to reconstruct the high resolution feature maps from the decomposed versions. This provides a better trade-off between receptive field and computational efficiency, and can replace pooling operations in CNNs.

What carries the argument

The multi-level wavelet transform (with inverse) embedded at multiple levels in the U-Net architecture to decompose and reconstruct feature maps.

If this is right

Improved results on image denoising, single image super-resolution, and JPEG artifact removal compared to prior methods.
Effective replacement for pooling in any CNN that requires downsampling operations.
Generalization of average pooling and improvement over dilated filters without checkerboard patterns.
Extension to object classification tasks with maintained efficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Wavelet-based downsampling might preserve frequency information better in other signal processing tasks beyond images.
If the reconstruction is artifact-free, similar embeddings could be tested in video or 3D CNNs for temporal or volumetric data.
The approach could lead to parameter-free ways to control receptive field growth in network design.

Load-bearing premise

The inverse wavelet transform can reconstruct high-resolution feature maps from the low-resolution wavelet coefficients without adding significant artifacts or losing the efficiency advantage.

What would settle it

A direct comparison experiment where MWCNN is applied to a standard benchmark like BSD68 for denoising and shows no improvement in PSNR or SSIM over a baseline U-Net with strided convolutions or pooling at equivalent computational cost.

Figures

Figures reproduced from arXiv: 1907.03128 by Hongzhi Zhang, Pengju Liu, Wangmeng Zuo, Wei Lian.

**Figure 2.** Figure 2: From WPT to MWCNN. Intuitively, WPT can be seen [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Multi-level wavelet-CNN architecture. It consists of two parts: the contracting and expanding subnetworks. Each solid [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Illustration of average pooling, dilated filter and the proposed MWCNN. Take one CNN block as an example: (a) sum-pooling with factor 2 leads to the most significant information loss which is not suitable for image restoration; (b) dilated filtering with rate 2 is equal to shared parameter convolution on sub-images; (c) the proposed MWCNN first decomposes an image into 4 sub-bands and then concatenates the… view at source ↗

**Figure 5.** Figure 5: Illustration of the gridding effect. Taken 3-layer CNNs as an example: (a) the dilated filtering with rate 2 suffers from large amount of information loss, (b) the two neighbored pixels are based on information from totally non-overlapped locations, and (c) our MWCNN can perfectly avoid underlying drawbacks. Compared with dilated filtering, MWCNN can also avoid the gridding effect. With the increase of dep… view at source ↗

**Figure 6.** Figure 6: Image denoising results of “T est044” (Set68) with noise level of 50. TABLE II: Average PSNR(dB) / SSIM results of the competing methods for SISR with scale factors S = 2, 3 and 4 on datasets Set5, Set14, BSD100 and Urban100. Red color indicates the best performance. Dataset S VDSR [2] DnCNN [5] RED30 [20] SRResNet [11] LapSRN [3] DRRN [17] MemNet [19] WaveResNet [45] SRMDNF [44] MWCNN(P) MWCNN Set5 ×2 37.… view at source ↗

**Figure 7.** Figure 7: Single image super-resolution: result of “253027” (BSD100) with upscaling factor of ×4. network. For qualitative comparisons, we use source codes of nine CNN-based methods, including VDSR [2], DnCNN [5], RED30 [20], SRResNet [11], LapSRN [3], DRRN [17], MemNet [19], WaveResNet [45] and SRMDNF [44]. Since the source code of SRResNet is not released, their results as shown in Table II are incomplete. And th… view at source ↗

**Figure 8.** Figure 8: JPEG image artifacts removal: visual results of “carnivaldolls” (LIVE1) with quality factor of 10. the JPEG encoder. In our experiments, MWCNN is compared to four competing methods, i.e., ARCNN [35], TNRD [26], DnCNN [5], and MemNet [19]. The results of MemNet [19] and TNRD [26] are incomplete according to their paper and released source codes. Table III shows the average PSNR/SSIM results of the competing… view at source ↗

read the original abstract

In computer vision, convolutional networks (CNNs) often adopts pooling to enlarge receptive field which has the advantage of low computational complexity. However, pooling can cause information loss and thus is detrimental to further operations such as features extraction and analysis. Recently, dilated filter has been proposed to trade off between receptive field size and efficiency. But the accompanying gridding effect can cause a sparse sampling of input images with checkerboard patterns. To address this problem, in this paper, we propose a novel multi-level wavelet CNN (MWCNN) model to achieve better trade-off between receptive field size and computational efficiency. The core idea is to embed wavelet transform into CNN architecture to reduce the resolution of feature maps while at the same time, increasing receptive field. Specifically, MWCNN for image restoration is based on U-Net architecture, and inverse wavelet transform (IWT) is deployed to reconstruct the high resolution (HR) feature maps. The proposed MWCNN can also be viewed as an improvement of dilated filter and a generalization of average pooling, and can be applied to not only image restoration tasks, but also any CNNs requiring a pooling operation. The experimental results demonstrate effectiveness of the proposed MWCNN for tasks such as image denoising, single image super-resolution, JPEG image artifacts removal and object classification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MWCNN replaces pooling with multi-level wavelets inside a U-Net and uses IWT to restore resolution, but the abstract supplies no metrics or implementation details so the practical gain is still unproven.

read the letter

The main thing to know is that this paper puts discrete wavelet transforms at multiple levels inside a U-Net encoder to shrink feature maps without the information loss of pooling or the gridding of dilated convolutions, then applies the inverse transform to get back to full resolution for restoration tasks. It also claims the same block can serve as a drop-in for any CNN that needs downsampling. That framing is clear and directly targets two well-known pain points. The architecture choice itself is the concrete new piece: a multi-level wavelet embedding tuned for image denoising, super-resolution, JPEG deblocking, and classification. It does a straightforward job spelling out why the alternatives fall short and why wavelets might give a better receptive-field versus cost trade-off. The generalization to average pooling is a useful way to think about it. The soft spots are exactly where the stress-test note points. The abstract asserts that IWT reconstructs the maps without new artifacts or efficiency hits, yet gives no wavelet family, boundary handling, or proof that the step stays clean on non-stationary CNN features rather than natural images. No numbers, no baselines, no ablations appear, so the effectiveness claim sits on unshown evidence. Prior wavelet-CNN work already exists, so the step feels incremental rather than a break. This is for people building restoration networks who are willing to test signal-processing blocks in place of standard downsampling. A reader who wants a practical alternative to pooling or dilation could get value once the experiments are filled in. I would send it to peer review because the motivation is solid and the proposal is specific enough that referees can check the missing pieces directly.

Referee Report

2 major / 1 minor

Summary. The paper proposes a novel multi-level wavelet CNN (MWCNN) model that embeds the wavelet transform into CNN architectures to achieve a better trade-off between receptive field size and computational efficiency. It uses discrete wavelet transform (DWT) to reduce the resolution of feature maps and inverse wavelet transform (IWT) to reconstruct high-resolution feature maps within a U-Net architecture for image restoration tasks. The model is presented as an improvement over dilated filters and a generalization of average pooling, with claimed applicability to various CNNs and experimental effectiveness on image denoising, single image super-resolution, JPEG artifact removal, and object classification.

Significance. If the results hold, this approach could offer an efficient alternative to pooling and dilated convolutions in CNN design by leveraging wavelet properties for resolution reduction and receptive field expansion without gridding effects or information loss. The conceptual framing as a generalization of pooling is a positive aspect.

major comments (2)

[Abstract] The central claim of experimental effectiveness on four tasks rests on an assertion without any accompanying metrics, baselines, ablation details, or error analysis, which is load-bearing since the soundness of the proposal depends on demonstrated performance gains.
[Abstract] The description of deploying IWT to reconstruct HR feature maps provides no implementation specifics such as the wavelet family, boundary handling, or how differentiability is ensured for backpropagation, leaving the assumption of artifact-free integration unverified for CNN feature maps.

minor comments (1)

[Abstract] The statement that MWCNN 'can be applied to not only image restoration tasks, but also any CNNs requiring a pooling operation' would benefit from more precise examples of such CNNs or operations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below, noting that the abstract serves as a concise summary while the full manuscript provides supporting details and experiments.

read point-by-point responses

Referee: [Abstract] The central claim of experimental effectiveness on four tasks rests on an assertion without any accompanying metrics, baselines, ablation details, or error analysis, which is load-bearing since the soundness of the proposal depends on demonstrated performance gains.

Authors: The abstract provides a high-level overview of the contributions. Quantitative metrics (e.g., PSNR/SSIM gains), baselines (DnCNN, VDSR, etc.), ablation studies on wavelet decomposition levels, and error analyses are presented in detail in Sections 4.1–4.4 of the manuscript for all four tasks. We can partially revise the abstract to include one or two representative performance numbers if space allows, to better highlight the gains without altering its summary nature. revision: partial
Referee: [Abstract] The description of deploying IWT to reconstruct HR feature maps provides no implementation specifics such as the wavelet family, boundary handling, or how differentiability is ensured for backpropagation, leaving the assumption of artifact-free integration unverified for CNN feature maps.

Authors: The abstract is intentionally brief. Full implementation details appear in Section 3: we employ the Haar wavelet (orthogonal with perfect reconstruction), use symmetric extension for boundary handling, and note that DWT/IWT are fixed linear operations and thus differentiable, enabling seamless end-to-end backpropagation. Experiments in Section 4 confirm artifact-free integration through visual and quantitative results. No change to the abstract is needed, as these specifics belong in the methods section. revision: no

Circularity Check

0 steps flagged

No circularity: architectural proposal without load-bearing derivations

full rationale

The paper proposes an MWCNN architecture that embeds discrete wavelet transform (DWT) and inverse wavelet transform (IWT) into a U-Net backbone for image restoration tasks. No equations, predictions, or first-principles derivations are presented that reduce to fitted parameters, self-definitions, or self-citation chains. The core claim is an engineering suggestion (wavelet-based downsampling as an alternative to pooling or dilation), supported by experimental results on standard tasks rather than any mathematical equivalence to its inputs. Self-citations, if present in the full text, are not load-bearing for the central architectural idea. This is a standard non-circular model proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated premise that wavelet transforms integrate cleanly into CNN feature maps.

pith-pipeline@v0.9.0 · 5757 in / 1094 out tokens · 21455 ms · 2026-05-25T01:33:29.653982+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The core idea is to embed wavelet transform into CNN architecture to reduce the resolution of feature maps while at the same time, increasing receptive field. Specifically, MWCNN for image restoration is based on U-Net architecture, and inverse wavelet transform (IWT) is deployed to reconstruct the high resolution (HR) feature maps.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DWT can be treated as downsampling operation and extend to any CNNs where pooling operation is required.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · 5 internal anchors

[1]

C. Dong, C. C. Loy, K. He, and X. Tang. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2):295–307, 2016

work page 2016
[2]

J. Kim, J. K. Lee, and K. M. Lee. Accurate image super-resolution using very deep convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition , pages 1646–1654, 2016

work page 2016
[3]

Lai, J.-B

W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang. Deep Laplacian pyramid networks for fast and accurate super-resolution.IEEE Conference on Computer Vision and Pattern Recognition , 2017

work page 2017
[4]

W. Shi, J. Caballero, F. Husz ´ar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang. Real-time single image and video super- resolution using an efﬁcient sub-pixel convolutional neural network. In IEEE Conference on Computer Vision and Pattern Recognition , pages 1874–1883, 2016

work page 2016
[5]

Zhang, W

K. Zhang, W. Zuo, Y . Chen, D. Meng, and L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing , PP(99):1–1, 2016

work page 2016
[6]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classiﬁcation with deep convolutional neural networks. In Advances in Neural Infor- mation Processing Systems , pages 1097–1105, 2012

work page 2012
[7]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[8]

Multi-Scale Context Aggregation by Dilated Convolutions

F. Yu and V . Koltun. Multi-scale context aggregation by dilated convo- lutions. arXiv preprint arXiv:1511.07122 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[9]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

work page 2016
[10]

K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. In European Conference on Computer Vision , pages 630–645. Springer, 2016. 11

work page 2016
[11]

Ledig, L

C. Ledig, L. Theis, F. Husz ´ar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. IEEE Conference on Computer Vision and Pattern Recognition , 2017

work page 2017
[12]

Zhang, W

K. Zhang, W. Zuo, S. Gu, and L. Zhang. Learning deep cnn denoiser prior for image restoration. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3929–3938, 2017

work page 2017
[13]

A Deep Learning Approach to Block-based Compressed Sensing of Images

A. Adler, D. Boublil, M. Elad, and M. Zibulevsky. A deep learning approach to block-based compressed sensing of images. arXiv preprint arXiv:1606.01519, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[14]

The Little Engine that Could: Regularization by Denoising (RED)

Y . Romano, M. Elad, and P. Milanfar. The little engine that could: Regularization by denoising (red). arXiv preprint arXiv:1611.02862 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[15]

S. Yan, X. Xu, D. Xu, S. Lin, and X. Li. Image classiﬁcation with densely sampled image windows and generalized adaptive multiple kernel learning. IEEE Transactions on Cybernetics , 45(3):381–390, 2015

work page 2015
[16]

P. Wang, P. Chen, Y . Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell. Understanding convolution for semantic segmentation. arXiv preprint arXiv:1702.08502, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[17]

Y . Tai, J. Yang, and X. Liu. Image super-resolution via deep recursive residual network. In IEEE Conference on Computer Vision and Pattern Recognition, 2017

work page 2017
[18]

C. Dong, C. L. Chen, and X. Tang. Accelerating the super-resolution convolutional neural network. In European Conference on Computer Vision, pages 391–407, 2016

work page 2016
[19]

Y . Tai, J. Yang, X. Liu, and C. Xu. MemNet: A persistent memory network for image restoration. In IEEE Conference on International Conference on Computer Vision , 2017

work page 2017
[20]

X. Mao, C. Shen, and Y . Yang. Image restoration using very deep con- volutional encoder-decoder networks with symmetric skip connections. In Advances in Neural Information Processing Systems , pages 2802–2810, 2016

work page 2016
[21]

Daubechies

I. Daubechies. The wavelet transform, time-frequency localization and signal analysis. IEEE Transactions on Information Theory , 36(5):961– 1005, 1990

work page 1990
[22]

Daubechies

I. Daubechies. Ten lectures on wavelets . SIAM, 1992

work page 1992
[23]

Ronneberger, P

O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention , pages 234–241, 2015

work page 2015
[24]

P. Liu, H. Zhang, K. Zhang, L. Lin, and W. Zuo. Multi-level wavelet- cnn for image restoration. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 773–782, 2018

work page 2018
[25]

M. R. Banham and A. K. Katsaggelos. Digital image restoration. IEEE Signal Processing Magazine , 14(2):24–41, 1997

work page 1997
[26]

Chen and T

Y . Chen and T. Pock. Trainable nonlinear reaction diffusion: A ﬂexible framework for fast and effective image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence , PP(99):1–1, 2015

work page 2015
[27]

Dabov, A

K. Dabov, A. Foi, V . Katkovnik, and K. Egiazarian. Image denoising by sparse 3-D transform-domain collaborative ﬁltering. IEEE Transactions on Image Processing , 16(8):2080–2095, 2007

work page 2080
[28]

S. Gu, L. Zhang, W. Zuo, and X. Feng. Weighted nuclear norm minimization with application to image denoising. In IEEE Conference on Computer Vision and Pattern Recognition , pages 2862–2869, 2014

work page 2014
[29]

Schmidt and S

U. Schmidt and S. Roth. Shrinkage ﬁelds for effective image restoration. In IEEE Conference on Computer Vision and Pattern Recognition , pages 2774–2781, 2014

work page 2014
[30]

Wright, A

J. Wright, A. Y . Yang, A. Ganesh, S. S. Sastry, and Y . Ma. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence , 31(2):210–227, 2009

work page 2009
[31]

Agostinelli, M

F. Agostinelli, M. R. Anderson, and H. Lee. Robust image denoising with multi-column deep neural networks. In Advances in Neural Infor- mation Processing Systems , pages 1493–1501, 2013

work page 2013
[32]

Jain and S

V . Jain and S. Seung. Natural image denoising with convolutional networks. In Advances in Neural Information Processing Systems , pages 769–776, 2009

work page 2009
[33]

J. Xie, L. Xu, and E. Chen. Image denoising and inpainting with deep neural networks. In International Conference on Neural Information Processing Systems, pages 341–349, 2012

work page 2012
[34]

H. C. Burger, C. J. Schuler, and S. Harmeling. Image denoising: Can plain neural networks compete with BM3D? In IEEE Conference on Computer Vision and Pattern Recognition , pages 2392–2399, 2012

work page 2012
[35]

C. Dong, Y . Deng, C. Change Loy, and X. Tang. Compression artifacts reduction by a deep convolutional network. In IEEE Conference on International Conference on Computer Vision , pages 576–584, 2015

work page 2015
[36]

B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee. Enhanced deep residual networks for single image super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition Workshops , pages 1132– 1140, 2017

work page 2017
[37]

Zhang, K

Y . Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y . Fu. Image super-resolution using very deep residual channel attention networks. In European Conference on Computer Vision , 2018

work page 2018
[38]

J. Kim, J. Kwon Lee, and K. Mu Lee. Deeply-recursive convolutional network for image super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition , pages 1637–1645, 2016

work page 2016
[39]

Santhanam, V

V . Santhanam, V . I. Morariu, and L. S. Davis. Generalized deep image to image regression. IEEE Conference on Computer Vision and Pattern Recognition, pages 5609–5619, 2017

work page 2017
[40]

Zhang, W

K. Zhang, W. Zuo, and L. Zhang. FFDNet: Toward a fast and ﬂexible solution for CNN based image denoising. IEEE Transactions on Image Processing, 2018

work page 2018
[41]

S. Guo, Z. Yan, K. Zhang, W. Zuo, and L. Zhang. Toward convolutional blind denoising of real photographs. In IEEE Conference on Computer Vision and Pattern Recognition , 2016

work page 2016
[42]

Johnson, A

J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision, pages 694–711. Springer, 2016

work page 2016
[43]

Riegler, S

G. Riegler, S. Schulter, M. Ruther, and H. Bischof. Conditioned regression models for non-blind single image super-resolution. In IEEE Conference on International Conference on Computer Vision , 2015

work page 2015
[44]

Zhang, W

K. Zhang, W. Zuo, and L. Zhang. Learning a single convolutional super- resolution network for multiple degradations. In IEEE Conference on Computer Vision and Pattern Recognition , 2018

work page 2018
[45]

W. Bae, J. Yoo, and J. C. Ye. Beyond deep residual learning for image restoration: Persistent homology-guided manifold simpliﬁcation. In IEEE Conference on Computer Vision and Pattern Recognition Workshops , pages 1141–1149, 2017

work page 2017
[46]

T. Guo, H. S. Mousavi, T. H. Vu, and V . Monga. Deep wavelet prediction for image super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017

work page 2017
[47]

Han and J

Y . Han and J. C. Ye. Framing U-Net via deep convolutional framelets: Application to sparse-view CT. IEEE Transactions on Medical Imaging , pages 1418–1429, 2018

work page 2018
[48]

J. C. Ye and Y . S. Han. Deep convolutional framelets: A general deep learning for inverse problems. Society for Industrial and Applied Mathematics, 2018

work page 2018
[49]

Szegedy, W

C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition , pages 1–9, 2015

work page 2015
[50]

D. Han, J. Kim, and J. Kim. Deep pyramidal residual networks. In IEEE Conference on Computer Vision and Pattern Recognition , pages 6307–6315, 2017

work page 2017
[51]

S. Zhai, Y . Cheng, Z. M. Zhang, and W. Lu. Doubly convolutional neural networks. In Advances in Neural Information Processing Systems , pages 1082–1090, 2016

work page 2016
[52]

Huang, Z

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. In 2017 IEEE Conference on Com- puter Vision and Pattern Recognition , pages 2261–2269, 2017

work page 2017
[53]

Zagoruyko and N

S. Zagoruyko and N. Komodakis. Wide residual networks. In British Machine Vision Conference, 2016

work page 2016
[54]

Takeki, D

A. Takeki, D. Ikami, G. Irie, and K. Aizawa. Parallel grid pooling for data augmentation. In European Conference on Computer Vision , 2018

work page 2018
[55]

Q. Wang, Z. Gao, J. Xie, W. Zuo, and P. Li. Global gated mixture of second-order pooling for improving deep convolutional neural networks. In Advances in Neural Information Processing Systems, pages 1277–1286, 2018

work page 2018
[56]

S. G. Mallat. A theory for multiresolution signal decomposition: the wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7):674–693, 1989

work page 1989
[57]

A. N. Akansu and R. A. Haddad. Multiresolution signal decomposition: transforms, subbands, and wavelets . Academic Press, 2001

work page 2001
[58]

A. S. Lewis and G. Knowles. Image compression using the 2-D wavelet transform. IEEE Transactions on Image Processing, 1(2):244–250, 1992

work page 1992
[59]

S. G. Chang, B. Yu, and M. Vetterli. Adaptive wavelet thresholding for image denoising and compression. IEEE Transactions on Image Processing, 9(9):1532–1546, 2000

work page 2000
[60]

Kingma and J

D. Kingma and J. Ba. Adam: A method for stochastic optimization. In International Conference for Learning Representations , 2015

work page 2015
[61]

S. Xie, R. Girshick, P. Doll ´ar, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition , pages 1492–1500, 2017. 12

work page 2017
[62]

Agustsson and R

E. Agustsson and R. Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In IEEE Conference on Computer Vision and Pattern Recognition Workshops , pages 1122–113, 2017

work page 2017
[63]

Martin, C

D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In IEEE Conference on International Conference Computer Vision , volume 2, pages 416–423, 2001

work page 2001
[64]

Huang, A

J.-B. Huang, A. Singh, and N. Ahuja. Single image super-resolution from transformed self-exemplars. In IEEE Conference on Computer Vision and Pattern Recognition , pages 5197–5206, 2015

work page 2015
[65]

Bevilacqua, A

M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel. Low- complexity single-image super-resolution based on nonnegative neighbor embedding. 2012

work page 2012
[66]

Zeyde, M

R. Zeyde, M. Elad, and M. Protter. On single image scale-up using sparse-representations. In International Conference on Curves and Surfaces, pages 711–730. Springer, 2010

work page 2010
[67]

A. K. Moorthy and A. C. Bovik. Visual importance pooling for image quality assessment. IEEE Journal of Selected Topics in Signal Processing, 3(2):193–201, 2009

work page 2009
[68]

Vedaldi and K

A. Vedaldi and K. Lenc. Matconvnet: Convolutional neural networks for matlab. In the 23rd ACM international conference on Multimedia , pages 689–692, 2015

work page 2015
[69]

Krizhevsky and G

A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Technical Report, Citeseer, 2009

work page 2009
[70]

Netzer, T

Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y . Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop, volume 2011, page 5, 2011

work page 2011
[71]

LeCun, L

Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278– 2324, 1998

work page 1998
[72]

B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017

work page 2017
[73]

J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for simplicity: The all convolutional net. In International Conference on Learning Representations Workshop, 2015

work page 2015

[1] [1]

C. Dong, C. C. Loy, K. He, and X. Tang. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2):295–307, 2016

work page 2016

[2] [2]

J. Kim, J. K. Lee, and K. M. Lee. Accurate image super-resolution using very deep convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition , pages 1646–1654, 2016

work page 2016

[3] [3]

Lai, J.-B

W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang. Deep Laplacian pyramid networks for fast and accurate super-resolution.IEEE Conference on Computer Vision and Pattern Recognition , 2017

work page 2017

[4] [4]

W. Shi, J. Caballero, F. Husz ´ar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang. Real-time single image and video super- resolution using an efﬁcient sub-pixel convolutional neural network. In IEEE Conference on Computer Vision and Pattern Recognition , pages 1874–1883, 2016

work page 2016

[5] [5]

Zhang, W

K. Zhang, W. Zuo, Y . Chen, D. Meng, and L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing , PP(99):1–1, 2016

work page 2016

[6] [6]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classiﬁcation with deep convolutional neural networks. In Advances in Neural Infor- mation Processing Systems , pages 1097–1105, 2012

work page 2012

[7] [7]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[8] [8]

Multi-Scale Context Aggregation by Dilated Convolutions

F. Yu and V . Koltun. Multi-scale context aggregation by dilated convo- lutions. arXiv preprint arXiv:1511.07122 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[9] [9]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

work page 2016

[10] [10]

K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. In European Conference on Computer Vision , pages 630–645. Springer, 2016. 11

work page 2016

[11] [11]

Ledig, L

C. Ledig, L. Theis, F. Husz ´ar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. IEEE Conference on Computer Vision and Pattern Recognition , 2017

work page 2017

[12] [12]

Zhang, W

K. Zhang, W. Zuo, S. Gu, and L. Zhang. Learning deep cnn denoiser prior for image restoration. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3929–3938, 2017

work page 2017

[13] [13]

A Deep Learning Approach to Block-based Compressed Sensing of Images

A. Adler, D. Boublil, M. Elad, and M. Zibulevsky. A deep learning approach to block-based compressed sensing of images. arXiv preprint arXiv:1606.01519, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[14] [14]

The Little Engine that Could: Regularization by Denoising (RED)

Y . Romano, M. Elad, and P. Milanfar. The little engine that could: Regularization by denoising (red). arXiv preprint arXiv:1611.02862 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[15] [15]

S. Yan, X. Xu, D. Xu, S. Lin, and X. Li. Image classiﬁcation with densely sampled image windows and generalized adaptive multiple kernel learning. IEEE Transactions on Cybernetics , 45(3):381–390, 2015

work page 2015

[16] [16]

P. Wang, P. Chen, Y . Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell. Understanding convolution for semantic segmentation. arXiv preprint arXiv:1702.08502, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[17] [17]

Y . Tai, J. Yang, and X. Liu. Image super-resolution via deep recursive residual network. In IEEE Conference on Computer Vision and Pattern Recognition, 2017

work page 2017

[18] [18]

C. Dong, C. L. Chen, and X. Tang. Accelerating the super-resolution convolutional neural network. In European Conference on Computer Vision, pages 391–407, 2016

work page 2016

[19] [19]

Y . Tai, J. Yang, X. Liu, and C. Xu. MemNet: A persistent memory network for image restoration. In IEEE Conference on International Conference on Computer Vision , 2017

work page 2017

[20] [20]

X. Mao, C. Shen, and Y . Yang. Image restoration using very deep con- volutional encoder-decoder networks with symmetric skip connections. In Advances in Neural Information Processing Systems , pages 2802–2810, 2016

work page 2016

[21] [21]

Daubechies

I. Daubechies. The wavelet transform, time-frequency localization and signal analysis. IEEE Transactions on Information Theory , 36(5):961– 1005, 1990

work page 1990

[22] [22]

Daubechies

I. Daubechies. Ten lectures on wavelets . SIAM, 1992

work page 1992

[23] [23]

Ronneberger, P

O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention , pages 234–241, 2015

work page 2015

[24] [24]

P. Liu, H. Zhang, K. Zhang, L. Lin, and W. Zuo. Multi-level wavelet- cnn for image restoration. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 773–782, 2018

work page 2018

[25] [25]

M. R. Banham and A. K. Katsaggelos. Digital image restoration. IEEE Signal Processing Magazine , 14(2):24–41, 1997

work page 1997

[26] [26]

Chen and T

Y . Chen and T. Pock. Trainable nonlinear reaction diffusion: A ﬂexible framework for fast and effective image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence , PP(99):1–1, 2015

work page 2015

[27] [27]

Dabov, A

K. Dabov, A. Foi, V . Katkovnik, and K. Egiazarian. Image denoising by sparse 3-D transform-domain collaborative ﬁltering. IEEE Transactions on Image Processing , 16(8):2080–2095, 2007

work page 2080

[28] [28]

S. Gu, L. Zhang, W. Zuo, and X. Feng. Weighted nuclear norm minimization with application to image denoising. In IEEE Conference on Computer Vision and Pattern Recognition , pages 2862–2869, 2014

work page 2014

[29] [29]

Schmidt and S

U. Schmidt and S. Roth. Shrinkage ﬁelds for effective image restoration. In IEEE Conference on Computer Vision and Pattern Recognition , pages 2774–2781, 2014

work page 2014

[30] [30]

Wright, A

J. Wright, A. Y . Yang, A. Ganesh, S. S. Sastry, and Y . Ma. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence , 31(2):210–227, 2009

work page 2009

[31] [31]

Agostinelli, M

F. Agostinelli, M. R. Anderson, and H. Lee. Robust image denoising with multi-column deep neural networks. In Advances in Neural Infor- mation Processing Systems , pages 1493–1501, 2013

work page 2013

[32] [32]

Jain and S

V . Jain and S. Seung. Natural image denoising with convolutional networks. In Advances in Neural Information Processing Systems , pages 769–776, 2009

work page 2009

[33] [33]

J. Xie, L. Xu, and E. Chen. Image denoising and inpainting with deep neural networks. In International Conference on Neural Information Processing Systems, pages 341–349, 2012

work page 2012

[34] [34]

H. C. Burger, C. J. Schuler, and S. Harmeling. Image denoising: Can plain neural networks compete with BM3D? In IEEE Conference on Computer Vision and Pattern Recognition , pages 2392–2399, 2012

work page 2012

[35] [35]

C. Dong, Y . Deng, C. Change Loy, and X. Tang. Compression artifacts reduction by a deep convolutional network. In IEEE Conference on International Conference on Computer Vision , pages 576–584, 2015

work page 2015

[36] [36]

B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee. Enhanced deep residual networks for single image super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition Workshops , pages 1132– 1140, 2017

work page 2017

[37] [37]

Zhang, K

Y . Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y . Fu. Image super-resolution using very deep residual channel attention networks. In European Conference on Computer Vision , 2018

work page 2018

[38] [38]

J. Kim, J. Kwon Lee, and K. Mu Lee. Deeply-recursive convolutional network for image super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition , pages 1637–1645, 2016

work page 2016

[39] [39]

Santhanam, V

V . Santhanam, V . I. Morariu, and L. S. Davis. Generalized deep image to image regression. IEEE Conference on Computer Vision and Pattern Recognition, pages 5609–5619, 2017

work page 2017

[40] [40]

Zhang, W

K. Zhang, W. Zuo, and L. Zhang. FFDNet: Toward a fast and ﬂexible solution for CNN based image denoising. IEEE Transactions on Image Processing, 2018

work page 2018

[41] [41]

S. Guo, Z. Yan, K. Zhang, W. Zuo, and L. Zhang. Toward convolutional blind denoising of real photographs. In IEEE Conference on Computer Vision and Pattern Recognition , 2016

work page 2016

[42] [42]

Johnson, A

J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision, pages 694–711. Springer, 2016

work page 2016

[43] [43]

Riegler, S

G. Riegler, S. Schulter, M. Ruther, and H. Bischof. Conditioned regression models for non-blind single image super-resolution. In IEEE Conference on International Conference on Computer Vision , 2015

work page 2015

[44] [44]

Zhang, W

K. Zhang, W. Zuo, and L. Zhang. Learning a single convolutional super- resolution network for multiple degradations. In IEEE Conference on Computer Vision and Pattern Recognition , 2018

work page 2018

[45] [45]

W. Bae, J. Yoo, and J. C. Ye. Beyond deep residual learning for image restoration: Persistent homology-guided manifold simpliﬁcation. In IEEE Conference on Computer Vision and Pattern Recognition Workshops , pages 1141–1149, 2017

work page 2017

[46] [46]

T. Guo, H. S. Mousavi, T. H. Vu, and V . Monga. Deep wavelet prediction for image super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017

work page 2017

[47] [47]

Han and J

Y . Han and J. C. Ye. Framing U-Net via deep convolutional framelets: Application to sparse-view CT. IEEE Transactions on Medical Imaging , pages 1418–1429, 2018

work page 2018

[48] [48]

J. C. Ye and Y . S. Han. Deep convolutional framelets: A general deep learning for inverse problems. Society for Industrial and Applied Mathematics, 2018

work page 2018

[49] [49]

Szegedy, W

C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition , pages 1–9, 2015

work page 2015

[50] [50]

D. Han, J. Kim, and J. Kim. Deep pyramidal residual networks. In IEEE Conference on Computer Vision and Pattern Recognition , pages 6307–6315, 2017

work page 2017

[51] [51]

S. Zhai, Y . Cheng, Z. M. Zhang, and W. Lu. Doubly convolutional neural networks. In Advances in Neural Information Processing Systems , pages 1082–1090, 2016

work page 2016

[52] [52]

Huang, Z

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. In 2017 IEEE Conference on Com- puter Vision and Pattern Recognition , pages 2261–2269, 2017

work page 2017

[53] [53]

Zagoruyko and N

S. Zagoruyko and N. Komodakis. Wide residual networks. In British Machine Vision Conference, 2016

work page 2016

[54] [54]

Takeki, D

A. Takeki, D. Ikami, G. Irie, and K. Aizawa. Parallel grid pooling for data augmentation. In European Conference on Computer Vision , 2018

work page 2018

[55] [55]

Q. Wang, Z. Gao, J. Xie, W. Zuo, and P. Li. Global gated mixture of second-order pooling for improving deep convolutional neural networks. In Advances in Neural Information Processing Systems, pages 1277–1286, 2018

work page 2018

[56] [56]

S. G. Mallat. A theory for multiresolution signal decomposition: the wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7):674–693, 1989

work page 1989

[57] [57]

A. N. Akansu and R. A. Haddad. Multiresolution signal decomposition: transforms, subbands, and wavelets . Academic Press, 2001

work page 2001

[58] [58]

A. S. Lewis and G. Knowles. Image compression using the 2-D wavelet transform. IEEE Transactions on Image Processing, 1(2):244–250, 1992

work page 1992

[59] [59]

S. G. Chang, B. Yu, and M. Vetterli. Adaptive wavelet thresholding for image denoising and compression. IEEE Transactions on Image Processing, 9(9):1532–1546, 2000

work page 2000

[60] [60]

Kingma and J

D. Kingma and J. Ba. Adam: A method for stochastic optimization. In International Conference for Learning Representations , 2015

work page 2015

[61] [61]

S. Xie, R. Girshick, P. Doll ´ar, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition , pages 1492–1500, 2017. 12

work page 2017

[62] [62]

Agustsson and R

E. Agustsson and R. Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In IEEE Conference on Computer Vision and Pattern Recognition Workshops , pages 1122–113, 2017

work page 2017

[63] [63]

Martin, C

D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In IEEE Conference on International Conference Computer Vision , volume 2, pages 416–423, 2001

work page 2001

[64] [64]

Huang, A

J.-B. Huang, A. Singh, and N. Ahuja. Single image super-resolution from transformed self-exemplars. In IEEE Conference on Computer Vision and Pattern Recognition , pages 5197–5206, 2015

work page 2015

[65] [65]

Bevilacqua, A

M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel. Low- complexity single-image super-resolution based on nonnegative neighbor embedding. 2012

work page 2012

[66] [66]

Zeyde, M

R. Zeyde, M. Elad, and M. Protter. On single image scale-up using sparse-representations. In International Conference on Curves and Surfaces, pages 711–730. Springer, 2010

work page 2010

[67] [67]

A. K. Moorthy and A. C. Bovik. Visual importance pooling for image quality assessment. IEEE Journal of Selected Topics in Signal Processing, 3(2):193–201, 2009

work page 2009

[68] [68]

Vedaldi and K

A. Vedaldi and K. Lenc. Matconvnet: Convolutional neural networks for matlab. In the 23rd ACM international conference on Multimedia , pages 689–692, 2015

work page 2015

[69] [69]

Krizhevsky and G

A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Technical Report, Citeseer, 2009

work page 2009

[70] [70]

Netzer, T

Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y . Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop, volume 2011, page 5, 2011

work page 2011

[71] [71]

LeCun, L

Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278– 2324, 1998

work page 1998

[72] [72]

B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017

work page 2017

[73] [73]

J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for simplicity: The all convolutional net. In International Conference on Learning Representations Workshop, 2015

work page 2015