Multi-level Wavelet Convolutional Neural Networks
Pith reviewed 2026-05-25 01:33 UTC · model grok-4.3
The pith
Multi-level wavelet transforms embedded in CNNs allow larger receptive fields with less information loss than pooling or dilated convolutions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By embedding the wavelet transform into the CNN, the MWCNN reduces the resolution of feature maps to increase the receptive field size while preserving information better than pooling, and uses inverse wavelet transform to reconstruct the high resolution feature maps from the decomposed versions. This provides a better trade-off between receptive field and computational efficiency, and can replace pooling operations in CNNs.
What carries the argument
The multi-level wavelet transform (with inverse) embedded at multiple levels in the U-Net architecture to decompose and reconstruct feature maps.
If this is right
- Improved results on image denoising, single image super-resolution, and JPEG artifact removal compared to prior methods.
- Effective replacement for pooling in any CNN that requires downsampling operations.
- Generalization of average pooling and improvement over dilated filters without checkerboard patterns.
- Extension to object classification tasks with maintained efficiency.
Where Pith is reading between the lines
- Wavelet-based downsampling might preserve frequency information better in other signal processing tasks beyond images.
- If the reconstruction is artifact-free, similar embeddings could be tested in video or 3D CNNs for temporal or volumetric data.
- The approach could lead to parameter-free ways to control receptive field growth in network design.
Load-bearing premise
The inverse wavelet transform can reconstruct high-resolution feature maps from the low-resolution wavelet coefficients without adding significant artifacts or losing the efficiency advantage.
What would settle it
A direct comparison experiment where MWCNN is applied to a standard benchmark like BSD68 for denoising and shows no improvement in PSNR or SSIM over a baseline U-Net with strided convolutions or pooling at equivalent computational cost.
Figures
read the original abstract
In computer vision, convolutional networks (CNNs) often adopts pooling to enlarge receptive field which has the advantage of low computational complexity. However, pooling can cause information loss and thus is detrimental to further operations such as features extraction and analysis. Recently, dilated filter has been proposed to trade off between receptive field size and efficiency. But the accompanying gridding effect can cause a sparse sampling of input images with checkerboard patterns. To address this problem, in this paper, we propose a novel multi-level wavelet CNN (MWCNN) model to achieve better trade-off between receptive field size and computational efficiency. The core idea is to embed wavelet transform into CNN architecture to reduce the resolution of feature maps while at the same time, increasing receptive field. Specifically, MWCNN for image restoration is based on U-Net architecture, and inverse wavelet transform (IWT) is deployed to reconstruct the high resolution (HR) feature maps. The proposed MWCNN can also be viewed as an improvement of dilated filter and a generalization of average pooling, and can be applied to not only image restoration tasks, but also any CNNs requiring a pooling operation. The experimental results demonstrate effectiveness of the proposed MWCNN for tasks such as image denoising, single image super-resolution, JPEG image artifacts removal and object classification.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a novel multi-level wavelet CNN (MWCNN) model that embeds the wavelet transform into CNN architectures to achieve a better trade-off between receptive field size and computational efficiency. It uses discrete wavelet transform (DWT) to reduce the resolution of feature maps and inverse wavelet transform (IWT) to reconstruct high-resolution feature maps within a U-Net architecture for image restoration tasks. The model is presented as an improvement over dilated filters and a generalization of average pooling, with claimed applicability to various CNNs and experimental effectiveness on image denoising, single image super-resolution, JPEG artifact removal, and object classification.
Significance. If the results hold, this approach could offer an efficient alternative to pooling and dilated convolutions in CNN design by leveraging wavelet properties for resolution reduction and receptive field expansion without gridding effects or information loss. The conceptual framing as a generalization of pooling is a positive aspect.
major comments (2)
- [Abstract] The central claim of experimental effectiveness on four tasks rests on an assertion without any accompanying metrics, baselines, ablation details, or error analysis, which is load-bearing since the soundness of the proposal depends on demonstrated performance gains.
- [Abstract] The description of deploying IWT to reconstruct HR feature maps provides no implementation specifics such as the wavelet family, boundary handling, or how differentiability is ensured for backpropagation, leaving the assumption of artifact-free integration unverified for CNN feature maps.
minor comments (1)
- [Abstract] The statement that MWCNN 'can be applied to not only image restoration tasks, but also any CNNs requiring a pooling operation' would benefit from more precise examples of such CNNs or operations.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment below, noting that the abstract serves as a concise summary while the full manuscript provides supporting details and experiments.
read point-by-point responses
-
Referee: [Abstract] The central claim of experimental effectiveness on four tasks rests on an assertion without any accompanying metrics, baselines, ablation details, or error analysis, which is load-bearing since the soundness of the proposal depends on demonstrated performance gains.
Authors: The abstract provides a high-level overview of the contributions. Quantitative metrics (e.g., PSNR/SSIM gains), baselines (DnCNN, VDSR, etc.), ablation studies on wavelet decomposition levels, and error analyses are presented in detail in Sections 4.1–4.4 of the manuscript for all four tasks. We can partially revise the abstract to include one or two representative performance numbers if space allows, to better highlight the gains without altering its summary nature. revision: partial
-
Referee: [Abstract] The description of deploying IWT to reconstruct HR feature maps provides no implementation specifics such as the wavelet family, boundary handling, or how differentiability is ensured for backpropagation, leaving the assumption of artifact-free integration unverified for CNN feature maps.
Authors: The abstract is intentionally brief. Full implementation details appear in Section 3: we employ the Haar wavelet (orthogonal with perfect reconstruction), use symmetric extension for boundary handling, and note that DWT/IWT are fixed linear operations and thus differentiable, enabling seamless end-to-end backpropagation. Experiments in Section 4 confirm artifact-free integration through visual and quantitative results. No change to the abstract is needed, as these specifics belong in the methods section. revision: no
Circularity Check
No circularity: architectural proposal without load-bearing derivations
full rationale
The paper proposes an MWCNN architecture that embeds discrete wavelet transform (DWT) and inverse wavelet transform (IWT) into a U-Net backbone for image restoration tasks. No equations, predictions, or first-principles derivations are presented that reduce to fitted parameters, self-definitions, or self-citation chains. The core claim is an engineering suggestion (wavelet-based downsampling as an alternative to pooling or dilation), supported by experimental results on standard tasks rather than any mathematical equivalence to its inputs. Self-citations, if present in the full text, are not load-bearing for the central architectural idea. This is a standard non-circular model proposal.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The core idea is to embed wavelet transform into CNN architecture to reduce the resolution of feature maps while at the same time, increasing receptive field. Specifically, MWCNN for image restoration is based on U-Net architecture, and inverse wavelet transform (IWT) is deployed to reconstruct the high resolution (HR) feature maps.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
DWT can be treated as downsampling operation and extend to any CNNs where pooling operation is required.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
C. Dong, C. C. Loy, K. He, and X. Tang. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2):295–307, 2016
work page 2016
-
[2]
J. Kim, J. K. Lee, and K. M. Lee. Accurate image super-resolution using very deep convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition , pages 1646–1654, 2016
work page 2016
- [3]
-
[4]
W. Shi, J. Caballero, F. Husz ´ar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang. Real-time single image and video super- resolution using an efficient sub-pixel convolutional neural network. In IEEE Conference on Computer Vision and Pattern Recognition , pages 1874–1883, 2016
work page 2016
- [5]
-
[6]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Infor- mation Processing Systems , pages 1097–1105, 2012
work page 2012
-
[7]
Very Deep Convolutional Networks for Large-Scale Image Recognition
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 , 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[8]
Multi-Scale Context Aggregation by Dilated Convolutions
F. Yu and V . Koltun. Multi-scale context aggregation by dilated convo- lutions. arXiv preprint arXiv:1511.07122 , 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[9]
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016
work page 2016
-
[10]
K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. In European Conference on Computer Vision , pages 630–645. Springer, 2016. 11
work page 2016
- [11]
- [12]
-
[13]
A Deep Learning Approach to Block-based Compressed Sensing of Images
A. Adler, D. Boublil, M. Elad, and M. Zibulevsky. A deep learning approach to block-based compressed sensing of images. arXiv preprint arXiv:1606.01519, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[14]
The Little Engine that Could: Regularization by Denoising (RED)
Y . Romano, M. Elad, and P. Milanfar. The little engine that could: Regularization by denoising (red). arXiv preprint arXiv:1611.02862 , 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[15]
S. Yan, X. Xu, D. Xu, S. Lin, and X. Li. Image classification with densely sampled image windows and generalized adaptive multiple kernel learning. IEEE Transactions on Cybernetics , 45(3):381–390, 2015
work page 2015
-
[16]
P. Wang, P. Chen, Y . Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell. Understanding convolution for semantic segmentation. arXiv preprint arXiv:1702.08502, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[17]
Y . Tai, J. Yang, and X. Liu. Image super-resolution via deep recursive residual network. In IEEE Conference on Computer Vision and Pattern Recognition, 2017
work page 2017
-
[18]
C. Dong, C. L. Chen, and X. Tang. Accelerating the super-resolution convolutional neural network. In European Conference on Computer Vision, pages 391–407, 2016
work page 2016
-
[19]
Y . Tai, J. Yang, X. Liu, and C. Xu. MemNet: A persistent memory network for image restoration. In IEEE Conference on International Conference on Computer Vision , 2017
work page 2017
-
[20]
X. Mao, C. Shen, and Y . Yang. Image restoration using very deep con- volutional encoder-decoder networks with symmetric skip connections. In Advances in Neural Information Processing Systems , pages 2802–2810, 2016
work page 2016
-
[21]
I. Daubechies. The wavelet transform, time-frequency localization and signal analysis. IEEE Transactions on Information Theory , 36(5):961– 1005, 1990
work page 1990
- [22]
-
[23]
O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention , pages 234–241, 2015
work page 2015
-
[24]
P. Liu, H. Zhang, K. Zhang, L. Lin, and W. Zuo. Multi-level wavelet- cnn for image restoration. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 773–782, 2018
work page 2018
-
[25]
M. R. Banham and A. K. Katsaggelos. Digital image restoration. IEEE Signal Processing Magazine , 14(2):24–41, 1997
work page 1997
-
[26]
Y . Chen and T. Pock. Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence , PP(99):1–1, 2015
work page 2015
- [27]
-
[28]
S. Gu, L. Zhang, W. Zuo, and X. Feng. Weighted nuclear norm minimization with application to image denoising. In IEEE Conference on Computer Vision and Pattern Recognition , pages 2862–2869, 2014
work page 2014
-
[29]
U. Schmidt and S. Roth. Shrinkage fields for effective image restoration. In IEEE Conference on Computer Vision and Pattern Recognition , pages 2774–2781, 2014
work page 2014
- [30]
-
[31]
F. Agostinelli, M. R. Anderson, and H. Lee. Robust image denoising with multi-column deep neural networks. In Advances in Neural Infor- mation Processing Systems , pages 1493–1501, 2013
work page 2013
-
[32]
V . Jain and S. Seung. Natural image denoising with convolutional networks. In Advances in Neural Information Processing Systems , pages 769–776, 2009
work page 2009
-
[33]
J. Xie, L. Xu, and E. Chen. Image denoising and inpainting with deep neural networks. In International Conference on Neural Information Processing Systems, pages 341–349, 2012
work page 2012
-
[34]
H. C. Burger, C. J. Schuler, and S. Harmeling. Image denoising: Can plain neural networks compete with BM3D? In IEEE Conference on Computer Vision and Pattern Recognition , pages 2392–2399, 2012
work page 2012
-
[35]
C. Dong, Y . Deng, C. Change Loy, and X. Tang. Compression artifacts reduction by a deep convolutional network. In IEEE Conference on International Conference on Computer Vision , pages 576–584, 2015
work page 2015
-
[36]
B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee. Enhanced deep residual networks for single image super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition Workshops , pages 1132– 1140, 2017
work page 2017
- [37]
-
[38]
J. Kim, J. Kwon Lee, and K. Mu Lee. Deeply-recursive convolutional network for image super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition , pages 1637–1645, 2016
work page 2016
-
[39]
V . Santhanam, V . I. Morariu, and L. S. Davis. Generalized deep image to image regression. IEEE Conference on Computer Vision and Pattern Recognition, pages 5609–5619, 2017
work page 2017
- [40]
-
[41]
S. Guo, Z. Yan, K. Zhang, W. Zuo, and L. Zhang. Toward convolutional blind denoising of real photographs. In IEEE Conference on Computer Vision and Pattern Recognition , 2016
work page 2016
-
[42]
J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision, pages 694–711. Springer, 2016
work page 2016
-
[43]
G. Riegler, S. Schulter, M. Ruther, and H. Bischof. Conditioned regression models for non-blind single image super-resolution. In IEEE Conference on International Conference on Computer Vision , 2015
work page 2015
- [44]
-
[45]
W. Bae, J. Yoo, and J. C. Ye. Beyond deep residual learning for image restoration: Persistent homology-guided manifold simplification. In IEEE Conference on Computer Vision and Pattern Recognition Workshops , pages 1141–1149, 2017
work page 2017
-
[46]
T. Guo, H. S. Mousavi, T. H. Vu, and V . Monga. Deep wavelet prediction for image super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017
work page 2017
- [47]
-
[48]
J. C. Ye and Y . S. Han. Deep convolutional framelets: A general deep learning for inverse problems. Society for Industrial and Applied Mathematics, 2018
work page 2018
-
[49]
C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition , pages 1–9, 2015
work page 2015
-
[50]
D. Han, J. Kim, and J. Kim. Deep pyramidal residual networks. In IEEE Conference on Computer Vision and Pattern Recognition , pages 6307–6315, 2017
work page 2017
-
[51]
S. Zhai, Y . Cheng, Z. M. Zhang, and W. Lu. Doubly convolutional neural networks. In Advances in Neural Information Processing Systems , pages 1082–1090, 2016
work page 2016
- [52]
-
[53]
S. Zagoruyko and N. Komodakis. Wide residual networks. In British Machine Vision Conference, 2016
work page 2016
- [54]
-
[55]
Q. Wang, Z. Gao, J. Xie, W. Zuo, and P. Li. Global gated mixture of second-order pooling for improving deep convolutional neural networks. In Advances in Neural Information Processing Systems, pages 1277–1286, 2018
work page 2018
-
[56]
S. G. Mallat. A theory for multiresolution signal decomposition: the wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7):674–693, 1989
work page 1989
-
[57]
A. N. Akansu and R. A. Haddad. Multiresolution signal decomposition: transforms, subbands, and wavelets . Academic Press, 2001
work page 2001
-
[58]
A. S. Lewis and G. Knowles. Image compression using the 2-D wavelet transform. IEEE Transactions on Image Processing, 1(2):244–250, 1992
work page 1992
-
[59]
S. G. Chang, B. Yu, and M. Vetterli. Adaptive wavelet thresholding for image denoising and compression. IEEE Transactions on Image Processing, 9(9):1532–1546, 2000
work page 2000
-
[60]
D. Kingma and J. Ba. Adam: A method for stochastic optimization. In International Conference for Learning Representations , 2015
work page 2015
-
[61]
S. Xie, R. Girshick, P. Doll ´ar, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition , pages 1492–1500, 2017. 12
work page 2017
-
[62]
E. Agustsson and R. Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In IEEE Conference on Computer Vision and Pattern Recognition Workshops , pages 1122–113, 2017
work page 2017
-
[63]
D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In IEEE Conference on International Conference Computer Vision , volume 2, pages 416–423, 2001
work page 2001
- [64]
-
[65]
M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel. Low- complexity single-image super-resolution based on nonnegative neighbor embedding. 2012
work page 2012
- [66]
-
[67]
A. K. Moorthy and A. C. Bovik. Visual importance pooling for image quality assessment. IEEE Journal of Selected Topics in Signal Processing, 3(2):193–201, 2009
work page 2009
-
[68]
A. Vedaldi and K. Lenc. Matconvnet: Convolutional neural networks for matlab. In the 23rd ACM international conference on Multimedia , pages 689–692, 2015
work page 2015
-
[69]
A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Technical Report, Citeseer, 2009
work page 2009
- [70]
- [71]
-
[72]
B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017
work page 2017
-
[73]
J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for simplicity: The all convolutional net. In International Conference on Learning Representations Workshop, 2015
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.