Preventing Overfitting in Deep Image Prior for Hyperspectral Image Denoising

Athanasios A. Rontogiannis; Panagiotis Gkotsis

arxiv: 2604.08272 · v1 · submitted 2026-04-09 · 💻 cs.CV · eess.IV

Preventing Overfitting in Deep Image Prior for Hyperspectral Image Denoising

Panagiotis Gkotsis , Athanasios A. Rontogiannis This is my paper

Pith reviewed 2026-05-10 16:53 UTC · model grok-4.3

classification 💻 cs.CV eess.IV

keywords deep image priorhyperspectral image denoisingoverfitting preventionsmooth l1 lossdivergence regularizationinput optimizationunsupervised denoisinggaussian noise

0 comments

The pith

Combining a Smooth l1 term, divergence-based regularization, and input optimization prevents overfitting in deep image prior for hyperspectral image denoising.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper demonstrates that deep image prior networks for hyperspectral image denoising can avoid overfitting by pairing a robust data fidelity term with explicit sensitivity penalties. Deep image prior requires no external training data yet normally degrades when training continues too long because it begins fitting noise instead of signal. The authors replace the standard squared-error loss with a Smooth l1 term, add a divergence penalty that limits output sensitivity to small input perturbations, and optimize the input image itself during training. On real hyperspectral images containing Gaussian, sparse, and stripe noise, the network continues to improve without early stopping and outperforms earlier DIP variants. Readers would care because the change makes an otherwise fragile unsupervised method more reliable for applications such as remote sensing where clean reference images are unavailable.

Core claim

The paper establishes that overfitting in deep image prior for hyperspectral image denoising can be mitigated by jointly employing a Smooth ℓ1 data fidelity term, a divergence-based sensitivity regularization, and input optimization during training. This combination allows the method to achieve superior denoising performance on real hyperspectral images corrupted by various noise types without requiring early stopping.

What carries the argument

The joint combination of Smooth ℓ1 data fidelity, divergence-based sensitivity regularization, and input optimization, which together constrain the network to reconstruct signal rather than noise in the unsupervised DIP setting.

If this is right

The modified DIP training no longer requires early stopping because performance does not degrade with longer optimization.
Denoising quality on Gaussian, sparse, and stripe noise exceeds that of prior DIP-based HSI methods.
The approach preserves spectral details while removing noise in real captured images.
The same training procedure works across multiple noise models without per-case retuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same loss and regularization pattern could be tested on other inverse problems where DIP is applied, such as super-resolution or inpainting.
Quantitative measurement of overfitting reduction would be possible by running the method on synthetic hyperspectral data with known ground-truth clean images.
Adjusting the strength of the divergence penalty might yield noise-level-specific schedules that further improve results.

Load-bearing premise

The assumption that adding the Smooth ℓ1 term, divergence regularization, and input optimization will reliably prevent overfitting across different noise types and image contents without discarding useful spectral information or introducing artifacts.

What would settle it

A concrete falsifier would be a new hyperspectral image set where, even after applying the three proposed components, reconstruction quality rises and then falls with additional training iterations, indicating that overfitting still occurs.

Figures

Figures reproduced from arXiv: 2604.08272 by Athanasios A. Rontogiannis, Panagiotis Gkotsis.

**Figure 2.** Figure 2: Effect of joint input optimization on denoising performance under additive Gaussian noise for three different loss formulations. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: MPSNR results for SURE-DHIP [15], HLF-DHIP [16] and the [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Original Washington DC Mall HSI segment, noisy version, and denoised versions obtained by SURE-DHIP [15], HLF-DHIP [16] and the proposed [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Original Salinas HSI segment, noisy version, and denoised versions obtained by SURE-DHIP [15], HLF-DHIP [16] and the proposed method. The [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

Deep image prior (DIP) is an unsupervised deep learning framework that has been successfully applied to a variety of inverse imaging problems. However, DIP-based methods are inherently prone to overfitting, which leads to performance degradation and necessitates early stopping. In this paper, we propose a method to mitigate overfitting in DIP-based hyperspectral image (HSI) denoising by jointly combining robust data fidelity and explicit sensitivity regularization. The proposed approach employs a Smooth $\ell_1$ data term together with a divergence-based regularization and input optimization during training. Experimental results on real HSIs corrupted by Gaussian, sparse, and stripe noise demonstrate that the proposed method effectively prevents overfitting and achieves superior denoising performance compared to state-of-the-art DIP-based HSI denoising methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A practical tweak to DIP for HSI denoising that adds Smooth L1 fidelity, divergence regularization, and input optimization to cut overfitting, backed by real-data experiments showing gains over prior DIP variants.

read the letter

The main point is that this paper gives a targeted way to keep DIP from overfitting when denoising hyperspectral images. They replace the usual L2 data term with Smooth L1, add a divergence-based penalty on how sensitive the network is to input changes, and optimize the input itself during training. That combination is presented as new for the HSI case and seems to let them run longer without early stopping while handling Gaussian, sparse, and stripe noise on real scenes better than earlier DIP HSI methods.

Referee Report

2 major / 1 minor

Summary. The paper claims that combining a Smooth ℓ1 data fidelity term, divergence-based sensitivity regularization, and input optimization within the Deep Image Prior (DIP) framework prevents overfitting for hyperspectral image (HSI) denoising. It reports that this approach eliminates the need for early stopping and yields superior denoising results on real HSIs corrupted by Gaussian, sparse, and stripe noise relative to prior DIP-based HSI methods.

Significance. If the central claim holds, the work supplies a practical, empirically motivated regularization strategy that makes unsupervised DIP more stable for HSI denoising—an important task in remote sensing where noise is prevalent and paired training data are scarce. The explicit use of divergence to control sensitivity is a reasonable engineering choice that could extend to other inverse problems, though its impact is currently supported only by limited experimental evidence.

major comments (2)

[Experiments] Experiments section: the manuscript asserts that the proposed combination 'effectively prevents overfitting' and achieves superior performance, yet provides no ablation studies isolating the contribution of the Smooth ℓ1 term, the divergence regularization, or the input optimization. Without these controls or plots of reconstruction quality versus iteration count, it remains unclear whether overfitting is genuinely mitigated or whether the gains arise from improved hyperparameter tuning.
[Method] Method section: the divergence-based sensitivity regularization is presented as central to the overfitting prevention claim, but the manuscript does not supply the explicit formulation (e.g., the precise divergence measure or its Jacobian approximation) or demonstrate that it preserves spectral fidelity across noise types. This detail is load-bearing for the weakest assumption that the regularization reliably avoids artifacts while removing noise.

minor comments (1)

[Abstract] Abstract: the claim of results on 'real HSIs' would be strengthened by naming the specific datasets or number of test images used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for minor revision. We address each major comment below and will update the manuscript accordingly to improve clarity and provide the requested supporting evidence.

read point-by-point responses

Referee: Experiments section: the manuscript asserts that the proposed combination 'effectively prevents overfitting' and achieves superior performance, yet provides no ablation studies isolating the contribution of the Smooth ℓ1 term, the divergence regularization, or the input optimization. Without these controls or plots of reconstruction quality versus iteration count, it remains unclear whether overfitting is genuinely mitigated or whether the gains arise from improved hyperparameter tuning.

Authors: We agree that dedicated ablations and convergence plots would strengthen the presentation. In the revised manuscript we will add an ablation study that isolates each component (Smooth ℓ1 fidelity, divergence regularization, and input optimization) by successively removing them and reporting PSNR/SSIM on the same real HSI test sets. We will also include plots of reconstruction quality versus iteration number for the full method and the standard DIP baseline, showing that the proposed combination avoids the characteristic performance drop associated with overfitting. These additions will help separate the regularization effect from hyperparameter choices. revision: yes
Referee: Method section: the divergence-based sensitivity regularization is presented as central to the overfitting prevention claim, but the manuscript does not supply the explicit formulation (e.g., the precise divergence measure or its Jacobian approximation) or demonstrate that it preserves spectral fidelity across noise types. This detail is load-bearing for the weakest assumption that the regularization reliably avoids artifacts while removing noise.

Authors: We acknowledge that the explicit formulation was omitted. The revised manuscript will include the precise expression: the regularization term is the Kullback-Leibler divergence between the empirical input distribution and the network output distribution, with the sensitivity approximated by a Monte Carlo estimate of the Jacobian-vector product. We will also add a short spectral-fidelity analysis reporting per-band PSNR and spectral angle mapper (SAM) values for all three noise types, confirming that the regularization suppresses noise without introducing visible spectral distortions. revision: yes

Circularity Check

0 steps flagged

Empirical regularization method with no circular derivation

full rationale

The paper proposes an engineering combination of Smooth ℓ1 fidelity, divergence-based sensitivity regularization, and input optimization to mitigate known DIP overfitting on HSI data. These terms are independently motivated and tested experimentally on real noisy HSIs; no derivation chain, fitted parameter renamed as prediction, or self-citation load-bearing step reduces the central claim to its own inputs by construction. The reported superiority is an empirical outcome, not a tautology.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the approach implicitly assumes standard DIP optimization behavior plus the effectiveness of the added terms, with likely tunable weights for the regularization components.

free parameters (1)

regularization weights
Balance parameters between data fidelity and divergence terms are expected to be chosen or tuned for the experiments.

axioms (1)

domain assumption DIP can be stabilized against overfitting by robust data terms and sensitivity regularization
The method builds on the assumption that these additions address the core overfitting issue in the DIP framework for this domain.

pith-pipeline@v0.9.0 · 5429 in / 1250 out tokens · 83299 ms · 2026-05-10T16:53:11.769871+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

[1]

Advances in hyperspectral image and signal processing: A comprehensive overview of the state of the art,

P. Ghamisi, N. Yokoya, J. Li, W. Liao, S. Liu, J. Plaza, B. Rasti, and A. Plaza, “Advances in hyperspectral image and signal processing: A comprehensive overview of the state of the art,”IEEE Geosci. Remote Sens. Mag., vol. 5, no. 4, pp. 37–78, 2017. (a) Original (b) Noisy (c) SURE-DHIP [15] (d) HLF-DHIP [16] (e) Proposed Fig. 4. Original Washington DC Ma...

work page arXiv 2017
[2]

Hyperspectral imaging and its applications: A review,

A. Bhargava, A. Sachdeva, K. Sharma, M. H. Alsharif, P. Uthansakul, and M. Uthansakul, “Hyperspectral imaging and its applications: A review,”Heliyon, vol. 10, no. 12, p. e33208, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2405844024092399

work page 2024
[3]

Hyper- spectral image denoising: From model-driven, data-driven, to model- data-driven,

Q. Zhang, Y . Zheng, Q. Yuan, M. Song, H. Yu, and Y . Xiao, “Hyper- spectral image denoising: From model-driven, data-driven, to model- data-driven,”IEEE Trans. Neural Netw. Learn. Syst., Jun. 2023

work page 2023
[4]

A comprehensive review of hyper- spectral image denoising techniques in remote sensing,

M. Joglekar and A. M. Deshpande, “A comprehensive review of hyper- spectral image denoising techniques in remote sensing,”International Journal of Remote Sensing, vol. 46, no. 16, pp. 5961–5995, 2025. [Online]. Available: https://doi.org/10.1080/01431161.2025.2527372

work page doi:10.1080/01431161.2025.2527372 2025
[5]

Spatial–spectral total variation regularized low-rank tensor decomposition for hyperspectral image denoising,

H. Fan, C. Li, Y . Guo, G. Kuang, and J. Ma, “Spatial–spectral total variation regularized low-rank tensor decomposition for hyperspectral image denoising,”IEEE Trans. Geosci. Remote Sens., vol. 56, no. 10, pp. 6196–6213, 2018

work page 2018
[6]

Hyperspectral im- age denoising employing a spatial–spectral deep residual convolutional neural network,

Q. Yuan, Q. Zhang, J. Li, H. Shen, and L. Zhang, “Hyperspectral im- age denoising employing a spatial–spectral deep residual convolutional neural network,”IEEE Trans. Geosci. Remote Sens., vol. 57, no. 2, pp. 1205–1218, 2019

work page 2019
[7]

Deep image prior,

D. Ulyanov, A. Vedaldi, and V . Lempitsky, “Deep image prior,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., June 2018, pp. 9446–9454

work page 2018
[8]

Deep hyperspectral prior: Single- image denoising, inpainting, super-resolution,

O. Sidorov and J. Y . Hardeberg, “Deep hyperspectral prior: Single- image denoising, inpainting, super-resolution,” inIEEE/CVF Int. Conf. on Comp. Vis. Workshop (ICCVW), 2019, pp. 3844–3851

work page 2019
[9]

Deep internal learning: Deep learning from a single input,

T. Tirer, R. Giryes, S. Y . Chun, and Y . C. Eldar, “Deep internal learning: Deep learning from a single input,”IEEE Signal Process. Mag., vol. 41, no. 4, pp. 40–57, 2024

work page 2024
[10]

Understanding untrained deep models for inverse problems: Algorithms and theory

I. Alkhouri, E. Bell, A. Ghosh, S. Liang, R. Wang, and S. Ravishankar, “Understanding untrained deep models for inverse problems: Algorithms and theory,” 2025, arXiv:2502.18612v1 [eess.IV]. [Online]. Available: https://arxiv.org/html/2502.18612v1

work page arXiv 2025
[11]

Analysis of deep image prior and exploiting self-guidance for image reconstruction,

S. Liang, E. Bell, Q. Qu, R. Wang, and S. Ravishankar, “Analysis of deep image prior and exploiting self-guidance for image reconstruction,” IEEE Trans. Comput. Imag., vol. 11, pp. 435–451, 2025

work page 2025
[12]

Deep random projector: Accelerated deep image prior,

T. Li, H. Wang, Z. Zhuang, and J. Sun, “Deep random projector: Accelerated deep image prior,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 18 176–18 185

work page 2023
[13]

Estimation of the mean of a multivariate normal distribution,

C. M. Stein, “Estimation of the mean of a multivariate normal distribution,”The Annals of Statistics, vol. 9, no. 6, pp. 1135–1151,

work page
[14]

Available: http://www.jstor.org/stable/2240405

[Online]. Available: http://www.jstor.org/stable/2240405

work page arXiv
[15]

Unsupervised learning with stein’s un- biased risk estimator

C. A. Metzler, A. Mousavi, R. Heckel, and R. G. Baraniuk, “Unsupervised learning with Stein’s unbiased risk estimator,” 2020, arXiv:1805.10531v3. [Online]. Available: https://arxiv.org/abs/1805.10531

work page arXiv 2020
[16]

Hyperspectral image denoising using SURE-based unsupervised convolutional neural networks,

H. V . Nguyen, M. O. Ulfarsson, and J. R. Sveinsson, “Hyperspectral image denoising using SURE-based unsupervised convolutional neural networks,”IEEE Trans. Geosci. Remote Sens., vol. 59, no. 4, pp. 3369– 3382, 2021

work page 2021
[17]

Unsupervised hyperspectral denoising based on deep image prior and least favorable distribution,

K. F. Niresi and C.-Y . Chi, “Unsupervised hyperspectral denoising based on deep image prior and least favorable distribution,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 5967–5983, 2022

work page 2022
[18]

Monte-Carlo SURE: A black- box optimization of regularization parameters for general denoising algorithms,

S. Ramani, T. Blu, and M. Unser, “Monte-Carlo SURE: A black- box optimization of regularization parameters for general denoising algorithms,”IEEE Trans. Image Process., vol. 17, no. 9, pp. 1540–1554, 2008

work page 2008
[19]

Theodoridis,Machine Learning: From the Classics to Deep Networks, Transformers and Diffusion Models, 3rd ed

S. Theodoridis,Machine Learning: From the Classics to Deep Networks, Transformers and Diffusion Models, 3rd ed. Academic Press, Inc., 2025

work page 2025
[20]

Mallat,A Wavelet Tour of Signal Processing: The Sparse Way, 3rd ed

S. Mallat,A Wavelet Tour of Signal Processing: The Sparse Way, 3rd ed. USA: Academic Press, Inc., 2008

work page 2008

[1] [1]

Advances in hyperspectral image and signal processing: A comprehensive overview of the state of the art,

P. Ghamisi, N. Yokoya, J. Li, W. Liao, S. Liu, J. Plaza, B. Rasti, and A. Plaza, “Advances in hyperspectral image and signal processing: A comprehensive overview of the state of the art,”IEEE Geosci. Remote Sens. Mag., vol. 5, no. 4, pp. 37–78, 2017. (a) Original (b) Noisy (c) SURE-DHIP [15] (d) HLF-DHIP [16] (e) Proposed Fig. 4. Original Washington DC Ma...

work page arXiv 2017

[2] [2]

Hyperspectral imaging and its applications: A review,

A. Bhargava, A. Sachdeva, K. Sharma, M. H. Alsharif, P. Uthansakul, and M. Uthansakul, “Hyperspectral imaging and its applications: A review,”Heliyon, vol. 10, no. 12, p. e33208, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2405844024092399

work page 2024

[3] [3]

Hyper- spectral image denoising: From model-driven, data-driven, to model- data-driven,

Q. Zhang, Y . Zheng, Q. Yuan, M. Song, H. Yu, and Y . Xiao, “Hyper- spectral image denoising: From model-driven, data-driven, to model- data-driven,”IEEE Trans. Neural Netw. Learn. Syst., Jun. 2023

work page 2023

[4] [4]

A comprehensive review of hyper- spectral image denoising techniques in remote sensing,

M. Joglekar and A. M. Deshpande, “A comprehensive review of hyper- spectral image denoising techniques in remote sensing,”International Journal of Remote Sensing, vol. 46, no. 16, pp. 5961–5995, 2025. [Online]. Available: https://doi.org/10.1080/01431161.2025.2527372

work page doi:10.1080/01431161.2025.2527372 2025

[5] [5]

Spatial–spectral total variation regularized low-rank tensor decomposition for hyperspectral image denoising,

H. Fan, C. Li, Y . Guo, G. Kuang, and J. Ma, “Spatial–spectral total variation regularized low-rank tensor decomposition for hyperspectral image denoising,”IEEE Trans. Geosci. Remote Sens., vol. 56, no. 10, pp. 6196–6213, 2018

work page 2018

[6] [6]

Hyperspectral im- age denoising employing a spatial–spectral deep residual convolutional neural network,

Q. Yuan, Q. Zhang, J. Li, H. Shen, and L. Zhang, “Hyperspectral im- age denoising employing a spatial–spectral deep residual convolutional neural network,”IEEE Trans. Geosci. Remote Sens., vol. 57, no. 2, pp. 1205–1218, 2019

work page 2019

[7] [7]

Deep image prior,

D. Ulyanov, A. Vedaldi, and V . Lempitsky, “Deep image prior,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., June 2018, pp. 9446–9454

work page 2018

[8] [8]

Deep hyperspectral prior: Single- image denoising, inpainting, super-resolution,

O. Sidorov and J. Y . Hardeberg, “Deep hyperspectral prior: Single- image denoising, inpainting, super-resolution,” inIEEE/CVF Int. Conf. on Comp. Vis. Workshop (ICCVW), 2019, pp. 3844–3851

work page 2019

[9] [9]

Deep internal learning: Deep learning from a single input,

T. Tirer, R. Giryes, S. Y . Chun, and Y . C. Eldar, “Deep internal learning: Deep learning from a single input,”IEEE Signal Process. Mag., vol. 41, no. 4, pp. 40–57, 2024

work page 2024

[10] [10]

Understanding untrained deep models for inverse problems: Algorithms and theory

I. Alkhouri, E. Bell, A. Ghosh, S. Liang, R. Wang, and S. Ravishankar, “Understanding untrained deep models for inverse problems: Algorithms and theory,” 2025, arXiv:2502.18612v1 [eess.IV]. [Online]. Available: https://arxiv.org/html/2502.18612v1

work page arXiv 2025

[11] [11]

Analysis of deep image prior and exploiting self-guidance for image reconstruction,

S. Liang, E. Bell, Q. Qu, R. Wang, and S. Ravishankar, “Analysis of deep image prior and exploiting self-guidance for image reconstruction,” IEEE Trans. Comput. Imag., vol. 11, pp. 435–451, 2025

work page 2025

[12] [12]

Deep random projector: Accelerated deep image prior,

T. Li, H. Wang, Z. Zhuang, and J. Sun, “Deep random projector: Accelerated deep image prior,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 18 176–18 185

work page 2023

[13] [13]

Estimation of the mean of a multivariate normal distribution,

C. M. Stein, “Estimation of the mean of a multivariate normal distribution,”The Annals of Statistics, vol. 9, no. 6, pp. 1135–1151,

work page

[14] [14]

Available: http://www.jstor.org/stable/2240405

[Online]. Available: http://www.jstor.org/stable/2240405

work page arXiv

[15] [15]

Unsupervised learning with stein’s un- biased risk estimator

C. A. Metzler, A. Mousavi, R. Heckel, and R. G. Baraniuk, “Unsupervised learning with Stein’s unbiased risk estimator,” 2020, arXiv:1805.10531v3. [Online]. Available: https://arxiv.org/abs/1805.10531

work page arXiv 2020

[16] [16]

Hyperspectral image denoising using SURE-based unsupervised convolutional neural networks,

H. V . Nguyen, M. O. Ulfarsson, and J. R. Sveinsson, “Hyperspectral image denoising using SURE-based unsupervised convolutional neural networks,”IEEE Trans. Geosci. Remote Sens., vol. 59, no. 4, pp. 3369– 3382, 2021

work page 2021

[17] [17]

Unsupervised hyperspectral denoising based on deep image prior and least favorable distribution,

K. F. Niresi and C.-Y . Chi, “Unsupervised hyperspectral denoising based on deep image prior and least favorable distribution,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 5967–5983, 2022

work page 2022

[18] [18]

Monte-Carlo SURE: A black- box optimization of regularization parameters for general denoising algorithms,

S. Ramani, T. Blu, and M. Unser, “Monte-Carlo SURE: A black- box optimization of regularization parameters for general denoising algorithms,”IEEE Trans. Image Process., vol. 17, no. 9, pp. 1540–1554, 2008

work page 2008

[19] [19]

Theodoridis,Machine Learning: From the Classics to Deep Networks, Transformers and Diffusion Models, 3rd ed

S. Theodoridis,Machine Learning: From the Classics to Deep Networks, Transformers and Diffusion Models, 3rd ed. Academic Press, Inc., 2025

work page 2025

[20] [20]

Mallat,A Wavelet Tour of Signal Processing: The Sparse Way, 3rd ed

S. Mallat,A Wavelet Tour of Signal Processing: The Sparse Way, 3rd ed. USA: Academic Press, Inc., 2008

work page 2008