Preventing Overfitting in Deep Image Prior for Hyperspectral Image Denoising
Pith reviewed 2026-05-10 16:53 UTC · model grok-4.3
The pith
Combining a Smooth l1 term, divergence-based regularization, and input optimization prevents overfitting in deep image prior for hyperspectral image denoising.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that overfitting in deep image prior for hyperspectral image denoising can be mitigated by jointly employing a Smooth ℓ1 data fidelity term, a divergence-based sensitivity regularization, and input optimization during training. This combination allows the method to achieve superior denoising performance on real hyperspectral images corrupted by various noise types without requiring early stopping.
What carries the argument
The joint combination of Smooth ℓ1 data fidelity, divergence-based sensitivity regularization, and input optimization, which together constrain the network to reconstruct signal rather than noise in the unsupervised DIP setting.
If this is right
- The modified DIP training no longer requires early stopping because performance does not degrade with longer optimization.
- Denoising quality on Gaussian, sparse, and stripe noise exceeds that of prior DIP-based HSI methods.
- The approach preserves spectral details while removing noise in real captured images.
- The same training procedure works across multiple noise models without per-case retuning.
Where Pith is reading between the lines
- The same loss and regularization pattern could be tested on other inverse problems where DIP is applied, such as super-resolution or inpainting.
- Quantitative measurement of overfitting reduction would be possible by running the method on synthetic hyperspectral data with known ground-truth clean images.
- Adjusting the strength of the divergence penalty might yield noise-level-specific schedules that further improve results.
Load-bearing premise
The assumption that adding the Smooth ℓ1 term, divergence regularization, and input optimization will reliably prevent overfitting across different noise types and image contents without discarding useful spectral information or introducing artifacts.
What would settle it
A concrete falsifier would be a new hyperspectral image set where, even after applying the three proposed components, reconstruction quality rises and then falls with additional training iterations, indicating that overfitting still occurs.
Figures
read the original abstract
Deep image prior (DIP) is an unsupervised deep learning framework that has been successfully applied to a variety of inverse imaging problems. However, DIP-based methods are inherently prone to overfitting, which leads to performance degradation and necessitates early stopping. In this paper, we propose a method to mitigate overfitting in DIP-based hyperspectral image (HSI) denoising by jointly combining robust data fidelity and explicit sensitivity regularization. The proposed approach employs a Smooth $\ell_1$ data term together with a divergence-based regularization and input optimization during training. Experimental results on real HSIs corrupted by Gaussian, sparse, and stripe noise demonstrate that the proposed method effectively prevents overfitting and achieves superior denoising performance compared to state-of-the-art DIP-based HSI denoising methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that combining a Smooth ℓ1 data fidelity term, divergence-based sensitivity regularization, and input optimization within the Deep Image Prior (DIP) framework prevents overfitting for hyperspectral image (HSI) denoising. It reports that this approach eliminates the need for early stopping and yields superior denoising results on real HSIs corrupted by Gaussian, sparse, and stripe noise relative to prior DIP-based HSI methods.
Significance. If the central claim holds, the work supplies a practical, empirically motivated regularization strategy that makes unsupervised DIP more stable for HSI denoising—an important task in remote sensing where noise is prevalent and paired training data are scarce. The explicit use of divergence to control sensitivity is a reasonable engineering choice that could extend to other inverse problems, though its impact is currently supported only by limited experimental evidence.
major comments (2)
- [Experiments] Experiments section: the manuscript asserts that the proposed combination 'effectively prevents overfitting' and achieves superior performance, yet provides no ablation studies isolating the contribution of the Smooth ℓ1 term, the divergence regularization, or the input optimization. Without these controls or plots of reconstruction quality versus iteration count, it remains unclear whether overfitting is genuinely mitigated or whether the gains arise from improved hyperparameter tuning.
- [Method] Method section: the divergence-based sensitivity regularization is presented as central to the overfitting prevention claim, but the manuscript does not supply the explicit formulation (e.g., the precise divergence measure or its Jacobian approximation) or demonstrate that it preserves spectral fidelity across noise types. This detail is load-bearing for the weakest assumption that the regularization reliably avoids artifacts while removing noise.
minor comments (1)
- [Abstract] Abstract: the claim of results on 'real HSIs' would be strengthened by naming the specific datasets or number of test images used.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and the recommendation for minor revision. We address each major comment below and will update the manuscript accordingly to improve clarity and provide the requested supporting evidence.
read point-by-point responses
-
Referee: Experiments section: the manuscript asserts that the proposed combination 'effectively prevents overfitting' and achieves superior performance, yet provides no ablation studies isolating the contribution of the Smooth ℓ1 term, the divergence regularization, or the input optimization. Without these controls or plots of reconstruction quality versus iteration count, it remains unclear whether overfitting is genuinely mitigated or whether the gains arise from improved hyperparameter tuning.
Authors: We agree that dedicated ablations and convergence plots would strengthen the presentation. In the revised manuscript we will add an ablation study that isolates each component (Smooth ℓ1 fidelity, divergence regularization, and input optimization) by successively removing them and reporting PSNR/SSIM on the same real HSI test sets. We will also include plots of reconstruction quality versus iteration number for the full method and the standard DIP baseline, showing that the proposed combination avoids the characteristic performance drop associated with overfitting. These additions will help separate the regularization effect from hyperparameter choices. revision: yes
-
Referee: Method section: the divergence-based sensitivity regularization is presented as central to the overfitting prevention claim, but the manuscript does not supply the explicit formulation (e.g., the precise divergence measure or its Jacobian approximation) or demonstrate that it preserves spectral fidelity across noise types. This detail is load-bearing for the weakest assumption that the regularization reliably avoids artifacts while removing noise.
Authors: We acknowledge that the explicit formulation was omitted. The revised manuscript will include the precise expression: the regularization term is the Kullback-Leibler divergence between the empirical input distribution and the network output distribution, with the sensitivity approximated by a Monte Carlo estimate of the Jacobian-vector product. We will also add a short spectral-fidelity analysis reporting per-band PSNR and spectral angle mapper (SAM) values for all three noise types, confirming that the regularization suppresses noise without introducing visible spectral distortions. revision: yes
Circularity Check
Empirical regularization method with no circular derivation
full rationale
The paper proposes an engineering combination of Smooth ℓ1 fidelity, divergence-based sensitivity regularization, and input optimization to mitigate known DIP overfitting on HSI data. These terms are independently motivated and tested experimentally on real noisy HSIs; no derivation chain, fitted parameter renamed as prediction, or self-citation load-bearing step reduces the central claim to its own inputs by construction. The reported superiority is an empirical outcome, not a tautology.
Axiom & Free-Parameter Ledger
free parameters (1)
- regularization weights
axioms (1)
- domain assumption DIP can be stabilized against overfitting by robust data terms and sensitivity regularization
Reference graph
Works this paper leans on
-
[1]
P. Ghamisi, N. Yokoya, J. Li, W. Liao, S. Liu, J. Plaza, B. Rasti, and A. Plaza, “Advances in hyperspectral image and signal processing: A comprehensive overview of the state of the art,”IEEE Geosci. Remote Sens. Mag., vol. 5, no. 4, pp. 37–78, 2017. (a) Original (b) Noisy (c) SURE-DHIP [15] (d) HLF-DHIP [16] (e) Proposed Fig. 4. Original Washington DC Ma...
-
[2]
Hyperspectral imaging and its applications: A review,
A. Bhargava, A. Sachdeva, K. Sharma, M. H. Alsharif, P. Uthansakul, and M. Uthansakul, “Hyperspectral imaging and its applications: A review,”Heliyon, vol. 10, no. 12, p. e33208, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2405844024092399
work page 2024
-
[3]
Hyper- spectral image denoising: From model-driven, data-driven, to model- data-driven,
Q. Zhang, Y . Zheng, Q. Yuan, M. Song, H. Yu, and Y . Xiao, “Hyper- spectral image denoising: From model-driven, data-driven, to model- data-driven,”IEEE Trans. Neural Netw. Learn. Syst., Jun. 2023
work page 2023
-
[4]
A comprehensive review of hyper- spectral image denoising techniques in remote sensing,
M. Joglekar and A. M. Deshpande, “A comprehensive review of hyper- spectral image denoising techniques in remote sensing,”International Journal of Remote Sensing, vol. 46, no. 16, pp. 5961–5995, 2025. [Online]. Available: https://doi.org/10.1080/01431161.2025.2527372
-
[5]
H. Fan, C. Li, Y . Guo, G. Kuang, and J. Ma, “Spatial–spectral total variation regularized low-rank tensor decomposition for hyperspectral image denoising,”IEEE Trans. Geosci. Remote Sens., vol. 56, no. 10, pp. 6196–6213, 2018
work page 2018
-
[6]
Q. Yuan, Q. Zhang, J. Li, H. Shen, and L. Zhang, “Hyperspectral im- age denoising employing a spatial–spectral deep residual convolutional neural network,”IEEE Trans. Geosci. Remote Sens., vol. 57, no. 2, pp. 1205–1218, 2019
work page 2019
-
[7]
D. Ulyanov, A. Vedaldi, and V . Lempitsky, “Deep image prior,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., June 2018, pp. 9446–9454
work page 2018
-
[8]
Deep hyperspectral prior: Single- image denoising, inpainting, super-resolution,
O. Sidorov and J. Y . Hardeberg, “Deep hyperspectral prior: Single- image denoising, inpainting, super-resolution,” inIEEE/CVF Int. Conf. on Comp. Vis. Workshop (ICCVW), 2019, pp. 3844–3851
work page 2019
-
[9]
Deep internal learning: Deep learning from a single input,
T. Tirer, R. Giryes, S. Y . Chun, and Y . C. Eldar, “Deep internal learning: Deep learning from a single input,”IEEE Signal Process. Mag., vol. 41, no. 4, pp. 40–57, 2024
work page 2024
-
[10]
Understanding untrained deep models for inverse problems: Algorithms and theory
I. Alkhouri, E. Bell, A. Ghosh, S. Liang, R. Wang, and S. Ravishankar, “Understanding untrained deep models for inverse problems: Algorithms and theory,” 2025, arXiv:2502.18612v1 [eess.IV]. [Online]. Available: https://arxiv.org/html/2502.18612v1
-
[11]
Analysis of deep image prior and exploiting self-guidance for image reconstruction,
S. Liang, E. Bell, Q. Qu, R. Wang, and S. Ravishankar, “Analysis of deep image prior and exploiting self-guidance for image reconstruction,” IEEE Trans. Comput. Imag., vol. 11, pp. 435–451, 2025
work page 2025
-
[12]
Deep random projector: Accelerated deep image prior,
T. Li, H. Wang, Z. Zhuang, and J. Sun, “Deep random projector: Accelerated deep image prior,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 18 176–18 185
work page 2023
-
[13]
Estimation of the mean of a multivariate normal distribution,
C. M. Stein, “Estimation of the mean of a multivariate normal distribution,”The Annals of Statistics, vol. 9, no. 6, pp. 1135–1151,
-
[14]
Available: http://www.jstor.org/stable/2240405
[Online]. Available: http://www.jstor.org/stable/2240405
-
[15]
Unsupervised learning with stein’s un- biased risk estimator
C. A. Metzler, A. Mousavi, R. Heckel, and R. G. Baraniuk, “Unsupervised learning with Stein’s unbiased risk estimator,” 2020, arXiv:1805.10531v3. [Online]. Available: https://arxiv.org/abs/1805.10531
-
[16]
Hyperspectral image denoising using SURE-based unsupervised convolutional neural networks,
H. V . Nguyen, M. O. Ulfarsson, and J. R. Sveinsson, “Hyperspectral image denoising using SURE-based unsupervised convolutional neural networks,”IEEE Trans. Geosci. Remote Sens., vol. 59, no. 4, pp. 3369– 3382, 2021
work page 2021
-
[17]
Unsupervised hyperspectral denoising based on deep image prior and least favorable distribution,
K. F. Niresi and C.-Y . Chi, “Unsupervised hyperspectral denoising based on deep image prior and least favorable distribution,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 5967–5983, 2022
work page 2022
-
[18]
S. Ramani, T. Blu, and M. Unser, “Monte-Carlo SURE: A black- box optimization of regularization parameters for general denoising algorithms,”IEEE Trans. Image Process., vol. 17, no. 9, pp. 1540–1554, 2008
work page 2008
-
[19]
S. Theodoridis,Machine Learning: From the Classics to Deep Networks, Transformers and Diffusion Models, 3rd ed. Academic Press, Inc., 2025
work page 2025
-
[20]
Mallat,A Wavelet Tour of Signal Processing: The Sparse Way, 3rd ed
S. Mallat,A Wavelet Tour of Signal Processing: The Sparse Way, 3rd ed. USA: Academic Press, Inc., 2008
work page 2008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.