UHD-GPGNet: UHD Video Denoising via Gaussian-Process-Guided Local Spatio-Temporal Modeling
Pith reviewed 2026-05-10 16:38 UTC · model grok-4.3
The pith
UHD-GPGNet guides UHD video denoising with sparse Gaussian-process posteriors on local spatio-temporal descriptors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
UHD-GPGNet estimates sparse GP posterior statistics over compact spatio-temporal descriptors to explicitly characterize local degradation response and uncertainty, which then guide adaptive temporal-detail fusion inside a structure-color collaborative reconstruction network.
What carries the argument
Sparse GP posterior statistics computed on compact spatio-temporal descriptors, used to characterize local degradation and uncertainty for guiding adaptive fusion.
Load-bearing premise
That sparse GP posterior statistics over compact descriptors can reliably characterize local degradation response and uncertainty without introducing artifacts or instability during fusion.
What would settle it
Side-by-side 4K video frames in which UHD-GPGNet produces more visible artifacts or temporal instability than an implicit baseline on the same mixed-degradation sequence.
Figures
read the original abstract
Ultra-high-definition (UHD) video denoising requires simultaneously suppressing complex spatio-temporal degradations, preserving fine textures and chromatic stability, and maintaining efficient full-resolution 4K deployment. In this paper, we propose UHD-GPGNet, a Gaussian-process-guided local spatio-temporal denoising framework that addresses these requirements jointly. Rather than relying on implicit feature learning alone, the method estimates sparse GP posterior statistics over compact spatio-temporal descriptors to explicitly characterize local degradation response and uncertainty, which then guide adaptive temporal-detail fusion. A structure-color collaborative reconstruction head decouples luminance, chroma, and high-frequency correction, while a heteroscedastic objective and overlap-tiled inference further stabilize optimization and enable memory-bounded 4K deployment. Experiments on UVG and RealisVideo-4K show that UHD-GPGNet achieves competitive restoration fidelity with substantially fewer parameters than existing methods, enables real-time full-resolution 4K inference with significant speedup over the closest quality competitor, and maintains robust performance across a multi-level mixed-degradation schedule.A real-world study on phone-captured 4K video further confirms that the model, trained entirely on synthetic degradation, generalizes to unseen real sensor noise and improves downstream object detection under challenging conditions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes UHD-GPGNet, a Gaussian-process-guided local spatio-temporal denoising framework for UHD video. It estimates sparse GP posterior statistics over compact spatio-temporal descriptors to explicitly characterize local degradation response and uncertainty, which guide adaptive temporal-detail fusion in a neural architecture. Additional components include a structure-color collaborative reconstruction head, a heteroscedastic objective, and overlap-tiled inference to enable memory-efficient real-time 4K processing. Experiments on UVG and RealisVideo-4K datasets claim competitive restoration fidelity with fewer parameters than prior methods, significant speedup for full-resolution 4K inference, robustness across multi-level mixed degradations, and generalization from synthetic training to real phone-captured 4K video noise, with benefits for downstream tasks like object detection.
Significance. If the central claims hold with supporting quantitative evidence, the work would offer a meaningful advance in efficient, uncertainty-aware UHD video denoising by combining explicit GP modeling with neural reconstruction, potentially improving parameter efficiency and cross-degradation robustness over purely implicit learning approaches.
major comments (2)
- [Abstract] Abstract: the headline claims of competitive fidelity, substantially fewer parameters, real-time 4K inference with speedup, and robust generalization across mixed degradations are asserted without any quantitative tables, metrics, error bars, or ablation results in the provided text, preventing verification of the performance advantages.
- [Method] Method (GP guidance description): the assertion that sparse GP posterior mean/variance statistics over spatio-temporal descriptors explicitly guide adaptive temporal-detail fusion without introducing temporal inconsistency or high-frequency artifacts lacks any derivation, kernel specification, or stability analysis under the heteroscedastic objective and overlap-tiled inference; this mechanism is load-bearing for attributing the claimed parameter efficiency and generalization to the GP component rather than the base network.
minor comments (1)
- [Abstract] Abstract: the final sentence on the real-world phone-captured study is informative but could be tightened to focus on the core technical contribution.
Simulated Author's Rebuttal
We thank the referee for their insightful comments on our submission. Below, we provide point-by-point responses to the major comments and indicate the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline claims of competitive fidelity, substantially fewer parameters, real-time 4K inference with speedup, and robust generalization across mixed degradations are asserted without any quantitative tables, metrics, error bars, or ablation results in the provided text, preventing verification of the performance advantages.
Authors: The abstract serves as a concise summary of the paper's contributions and results. The quantitative tables, metrics (such as PSNR and SSIM), error bars, ablation results, and comparisons demonstrating competitive fidelity, fewer parameters, real-time 4K inference, and generalization are all present in the Experiments section and supplementary material of the full manuscript. This allows readers to verify the claims. If the submission process led to only the abstract being reviewed, we can ensure the full document is provided. No changes to the abstract text are necessary as tables are not typically included there. revision: no
-
Referee: [Method] Method (GP guidance description): the assertion that sparse GP posterior mean/variance statistics over spatio-temporal descriptors explicitly guide adaptive temporal-detail fusion without introducing temporal inconsistency or high-frequency artifacts lacks any derivation, kernel specification, or stability analysis under the heteroscedastic objective and overlap-tiled inference; this mechanism is load-bearing for attributing the claimed parameter efficiency and generalization to the GP component rather than the base network.
Authors: The referee correctly identifies that the current description of the GP guidance could be more detailed. We will revise the Method section to include: (1) the specific kernel specification (e.g., the spatio-temporal RBF kernel), (2) a derivation of the sparse GP posterior mean and variance over the descriptors, and (3) a stability analysis under the heteroscedastic objective and overlap-tiled inference, including discussion on why temporal inconsistency and high-frequency artifacts are avoided. This will better support the claims regarding parameter efficiency and generalization attributable to the GP component. We appreciate this suggestion for improving clarity. revision: yes
Circularity Check
No circularity: derivation builds on standard GP and network components without self-referential reduction
full rationale
The paper presents UHD-GPGNet as a framework that estimates sparse GP posterior statistics over spatio-temporal descriptors to guide adaptive fusion, using a structure-color reconstruction head, heteroscedastic objective, and overlap-tiled inference. No equations or derivations in the provided abstract or description reduce the claimed performance (fidelity, real-time 4K, robustness) directly to fitted parameters or prior self-citations by construction. The method is explicitly described as combining established GP posterior computation with neural components, without renaming known results, smuggling ansatzes via self-citation, or treating fitted inputs as predictions. The central claim remains independent of any load-bearing self-referential step, making the derivation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Tim Brooks, Ben Mildenhall, Tianfan Xue, Jiawen Chen, Dillon Sharlet, and Jonathan T Barron. 2019. Unprocessing images for learned raw denoising. In Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition. 11036–11045
work page 2019
-
[2]
Kelvin CK Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. 2021. Basicvsr: The search for essential components in video super-resolution and beyond. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition. 4947–4956
work page 2021
-
[3]
Kelvin CK Chan, Shangchen Zhou, Xiangyu Xu, and Chen Change Loy. 2022. Basicvsr++: Improv- ing video super-resolution with enhanced propaga- tion and alignment. InProceedings of the IEEE/CVF 9 conference on computer vision and pattern recogni- tion. 5972–5981
work page 2022
-
[4]
Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. 2022. Simple baselines for image restora- tion. InEuropean Conference on Computer Vision. Springer, 17–33
work page 2022
-
[5]
Yuning Cui, Wenqi Ren, Xiaochun Cao, and Alois Knoll. 2024. Revitalizing convolutional network for image restoration.IEEE Transactions on Pattern Analysis and Machine Intelligence46, 12 (2024), 9423–9438
work page 2024
-
[6]
Zixuan Fu, Lanqing Guo, Chong Wang, Yufei Wang, Zhihao Li, and Bihan Wen. 2024. Temporal as a plu- gin: Unsupervised video denoising with pre-trained image denoisers. InEuropean Conference on Com- puter Vision. Springer, 349–367
work page 2024
-
[7]
Amirhosein Ghasemabadi, Muhammad K Janjua, Mohammad Salameh, and Di Niu. 2024. Learning truncated causal history model for video restoration. Advances in Neural Information Processing Systems 37 (2024), 27584–27615
work page 2024
-
[8]
Hang Guo, Yong Guo, Yaohua Zha, Yulun Zhang, Wenbo Li, Tao Dai, Shu-Tao Xia, and Yawei Li
-
[9]
InProceedings of the Computer Vision and Pattern Recognition Conference
Mambairv2: Attentive state space restoration. InProceedings of the Computer Vision and Pattern Recognition Conference. 28124–28133
-
[10]
Shi Guo, Zifei Yan, Kai Zhang, Wangmeng Zuo, and Lei Zhang. 2019. Toward convolutional blind de- noising of real photographs. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1712–1722
work page 2019
-
[11]
Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi. 2023. Neighborhood attention trans- former. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6185– 6194
work page 2023
-
[12]
Xin Jin, Simon Niklaus, Zhoutong Zhang, Zhihao Xia, Chunle Guo, Yuting Yang, Jiawen Chen, and Chongyi Li. 2025. Classic video denoising in a ma- chine learning world: Robust, fast, and controllable. InProceedings of the Computer Vision and Pattern Recognition Conference. 2084–2093
work page 2025
-
[13]
Alex Kendall and Yarin Gal. 2017. What uncer- tainties do we need in Bayesian deep learning for computer vision?. InAdvances in Neural Informa- tion Processing Systems, Vol. 30
work page 2017
-
[14]
Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and scalable predic- tive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems, Vol. 30
work page 2017
-
[15]
ChongyiLi, Chun-LeGuo, Ling-HaoHan, JunJiang, Ming-Ming Cheng, Jinwei Gu, and Chen Change Loy. 2023. Embedding fourier for ultra-high- definition low-light image enhancement. InInterna- tional Conference on Learning Representations
work page 2023
-
[16]
Dasong Li, Xiaoyu Zhang, Kang Hu, Wenjun Wang, Yanwei Lu, Jiahao Liu, Jie Li, Siwei Ma, and Wen Gao. 2023. A simple baseline for video restoration with grouped spatial-temporal shift. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9822–9832
work page 2023
-
[17]
Junyi Li, Xiaohe Wu, Zhenxing Niu, and Wangmeng Zuo. 2022. Unidirectional video denoising by mim- icking backward recurrent modules with look-ahead forward ones. InEuropean Conference on Computer Vision. Springer, 592–609
work page 2022
-
[18]
Jingyun Liang, Jiezhang Cao, Yuchen Fan, Kai Zhang, RakeshRanjan, Yawei Li, Radu Timofte, and Luc Van Gool. 2024. Vrt: A video restoration trans- former.IEEE Transactions on Image Processing33 (2024), 2171–2182
work page 2024
-
[19]
Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. 2021. SwinIR: Image restoration using swin transformer. InProceedings of the IEEE/CVF International Con- ference on Computer Vision Workshops. 1833–1844
work page 2021
-
[20]
Jingyun Liang, Yuchen Fan, Xiaoyu Xiang, Rakesh Ranjan, Eddy Ilg, Simon Green, Jiezhang Cao, Kai Zhang, Radu Timofte, and Luc V Gool. 2022. Recur- rent video restoration transformer with guided de- formable attention.Advances in Neural Information Processing Systems35 (2022), 378–393
work page 2022
-
[21]
Lydia Lindner, Alexander Effland, Filip Ilic, Thomas Pock, and Erich Kobler. 2023. Lightweight video de- noising using aggregated shifted window attention. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 351–360
work page 2023
-
[22]
Matteo Maggioni, Yibin Huang, Cheng Li, Shuai Xiao, Zhongqian Fu, and Fenglong Song. 2021. Efficient multi-stage video denoising with recur- rent spatio-temporal fusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3466–3475
work page 2021
-
[23]
Yudong Mao, Hao Luo, Zhiwei Zhong, Peilin Chen, Zhijiang Zhang, and Shiqi Wang. 2025. Making old film great again: Degradation-aware state space model for old film restoration. InProceedings of the Computer Vision and Pattern Recognition Confer- ence. 28039–28049
work page 2025
-
[24]
Alexandre Mercat, Marko Viitanen, and Jarno Vanne. 2020. UVG dataset: 50/120fps 4K sequences 10 for video codec analysis and development. InPro- ceedings of the 11th ACM multimedia systems con- ference. 297–302
work page 2020
-
[25]
Kristina Monakhova, Stephan R Richter, Laura Waller, and Vladlen Koltun. 2022. Dancing under the stars: video denoising in starlight. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16241–16251
work page 2022
-
[26]
David A Nix and Andreas S Weigend. 1994. Esti- mating the mean and variance of the target prob- ability distribution. InProceedings of the IEEE In- ternational Conference on Neural Networks, Vol. 1. IEEE, 55–60
work page 1994
-
[27]
Chenyang Qi, Junming Chen, Xin Yang, and Qifeng Chen. 2022. Real-time streaming video denois- ing with bidirectional buffers. InProceedings of the 30th ACM International Conference on Multimedia. 2758–2766
work page 2022
-
[28]
Joaquin Quinonero-Candela and Carl Edward Ras- mussen. 2005. A unifying view of sparse approxi- mate Gaussian process regression.Journal of ma- chine learning research6, Dec (2005), 1939–1959
work page 2005
-
[29]
Edward Snelson and Zoubin Ghahramani. 2005. Sparse Gaussian processes using pseudo-inputs.Ad- vances in neural information processing systems18 (2005)
work page 2005
-
[30]
Fastdvdnet: Towards real-time deep video denois- ing without flow estimation
MatiasTassano, JulieDelon, andThomasVeit.2020. Fastdvdnet: Towards real-time deep video denois- ing without flow estimation. InProceedings of the IEEE/CVF conference on computer vision and pat- tern recognition. 1354–1363
work page 2020
-
[31]
Michalis Titsias. 2009. Variational learning of induc- ing variables in sparse Gaussian processes. InArtifi- cial intelligence and statistics. PMLR, 567–574
work page 2009
-
[32]
Gregory Vaksman, Michael Elad, and Peyman Mi- lanfar. 2021. Patch craft: Video denoising by deep modeling and patch matching. InProceedings of the IEEE/CVF International Conference on Computer Vision. 2157–2166
work page 2021
-
[33]
Hao Wang, Zhuoran Zheng, Wenqi Ren, and Xi- aochun Cao. 2024. Towards efficient UHD image restoration: A benchmark and an efficient method. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
work page 2024
-
[34]
Edvr: Videorestoration with enhanced deformable convolutional networks
Xintao Wang, Kelvin CK Chan, Ke Yu, Chao Dong, andChenChangeLoy.2019. Edvr: Videorestoration with enhanced deformable convolutional networks. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition workshops. 0–0
work page 2019
-
[35]
Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. 2021. Real-ESRGAN: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF International Confer- ence on Computer Vision Workshops. 1905–1914
work page 2021
-
[36]
2006.Gaussian processes for machine learn- ing
Christopher KI Williams and Carl Edward Ras- mussen. 2006.Gaussian processes for machine learn- ing. Vol. 2. MIT press Cambridge, MA
work page 2006
-
[37]
Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, and Eric P Xing. 2016. Deep ker- nel learning. InArtificial intelligence and statistics. PMLR, 370–378
work page 2016
-
[38]
Fanghua Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, and Chao Dong. 2024. Scaling up to excel- lence: Practicing model scaling for photo-realistic image restoration in the wild. InProceedings of the IEEE/CVF conference on computer vision and pat- tern recognition. 25669–25680
work page 2024
-
[39]
Huanjing Yue, Cong Cao, Lei Liao, Ronghe Chu, and Jingyu Yang. 2020. Supervised raw video denoising withabenchmarkdatasetondynamicscenes.InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2301–2310
work page 2020
-
[40]
HuanjingYue, CongCao, LeiLiao, andJingyuYang
-
[41]
Rvideformer: Efficient raw video denoising transformer with a larger benchmark dataset.IEEE Transactions on Circuits and Systems for Video Technology(2025)
work page 2025
-
[42]
Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming- Hsuan Yang. 2022. Restormer: Efficient transformer for high-resolution image restoration. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5728–5739
work page 2022
-
[43]
Gengchen Zhang, Yulun Zhang, Xin Yuan, and Ying Fu. 2024. Binarized low-light raw video enhance- ment. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition. 25753–25762
work page 2024
-
[44]
Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timofte. 2021. Designing a practical degradation model for deep blind image super-resolution. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision. 4791–4800
work page 2021
-
[45]
Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. 2017. Beyond a Gaussian de- noiser: Residual learning of deep CNN for image de- noising.IEEE Transactions on Image Processing26, 7 (2017), 3142–3155. 11
work page 2017
-
[46]
Yi Zhang, Hongwei Qin, Xiaogang Wang, and Hong- sheng Li. 2021. Rethinking noise synthesis and modeling in raw denoising. InProceedings of the IEEE/CVF International Conference on Computer Vision. 4593–4601
work page 2021
- [47]
-
[48]
Zhuoran Zheng, Wenqi Ren, Xiaochun Cao, Xi- aobin Hu, Tao Wang, Fenglong Song, and Xiuyi Jia. 2021. Ultra-high-definition image dehazing via multi-guided bilateral learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16185–16194. 12
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.