pith. sign in

arxiv: 2604.11014 · v1 · submitted 2026-04-13 · 💻 cs.CV

UHD-GPGNet: UHD Video Denoising via Gaussian-Process-Guided Local Spatio-Temporal Modeling

Pith reviewed 2026-05-10 16:38 UTC · model grok-4.3

classification 💻 cs.CV
keywords UHD video denoisingGaussian process modelingspatio-temporal fusion4K real-time inferencevideo restorationsensor noise generalization
0
0 comments X

The pith

UHD-GPGNet guides UHD video denoising with sparse Gaussian-process posteriors on local spatio-temporal descriptors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents UHD-GPGNet, a denoising network for ultra-high-definition video that augments implicit learning with explicit Gaussian process modeling. It computes sparse GP posterior statistics over compact spatio-temporal descriptors to describe local degradation response and uncertainty, then uses those statistics to direct adaptive fusion of temporal details. A separate structure-color reconstruction head handles luminance, chroma, and high-frequency correction, supported by a heteroscedastic loss and overlap-tiled inference for stable 4K operation. On standard UHD benchmarks the model matches or exceeds prior quality while using far fewer parameters, runs in real time at full 4K resolution, and transfers from synthetic training data to real phone-captured noise.

Core claim

UHD-GPGNet estimates sparse GP posterior statistics over compact spatio-temporal descriptors to explicitly characterize local degradation response and uncertainty, which then guide adaptive temporal-detail fusion inside a structure-color collaborative reconstruction network.

What carries the argument

Sparse GP posterior statistics computed on compact spatio-temporal descriptors, used to characterize local degradation and uncertainty for guiding adaptive fusion.

Load-bearing premise

That sparse GP posterior statistics over compact descriptors can reliably characterize local degradation response and uncertainty without introducing artifacts or instability during fusion.

What would settle it

Side-by-side 4K video frames in which UHD-GPGNet produces more visible artifacts or temporal instability than an implicit baseline on the same mixed-degradation sequence.

Figures

Figures reproduced from arXiv: 2604.11014 by Chen Wu, Dianjie Lu, Guijuan Zhang, Linwei Fan, Pengwen Dai, Wei Wang, Weiyuan He, Yongzhen Wang, Zhuoran Zheng.

Figure 1
Figure 1. Figure 1: Full-resolution 4K speed–quality–memory land [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall architecture of the proposed UHD video denoiser, including the Y/C/RGB stems, hierarchical [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Sparse GP-guided fusion module. The module forms local descriptors, pools inducing tokens, predicts [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Main qualitative comparison on UVG with one full 4K frame and enlarged crops emphasizing thin struc [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: RealisVideo-4K robustness curves under con [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Representative 4K crop visualization on RealisVideo-4K at σ = 1.0, 2.0, and 3.0 [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Mechanism visualization: restored output, [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Real-world downstream detection on phone [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
read the original abstract

Ultra-high-definition (UHD) video denoising requires simultaneously suppressing complex spatio-temporal degradations, preserving fine textures and chromatic stability, and maintaining efficient full-resolution 4K deployment. In this paper, we propose UHD-GPGNet, a Gaussian-process-guided local spatio-temporal denoising framework that addresses these requirements jointly. Rather than relying on implicit feature learning alone, the method estimates sparse GP posterior statistics over compact spatio-temporal descriptors to explicitly characterize local degradation response and uncertainty, which then guide adaptive temporal-detail fusion. A structure-color collaborative reconstruction head decouples luminance, chroma, and high-frequency correction, while a heteroscedastic objective and overlap-tiled inference further stabilize optimization and enable memory-bounded 4K deployment. Experiments on UVG and RealisVideo-4K show that UHD-GPGNet achieves competitive restoration fidelity with substantially fewer parameters than existing methods, enables real-time full-resolution 4K inference with significant speedup over the closest quality competitor, and maintains robust performance across a multi-level mixed-degradation schedule.A real-world study on phone-captured 4K video further confirms that the model, trained entirely on synthetic degradation, generalizes to unseen real sensor noise and improves downstream object detection under challenging conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes UHD-GPGNet, a Gaussian-process-guided local spatio-temporal denoising framework for UHD video. It estimates sparse GP posterior statistics over compact spatio-temporal descriptors to explicitly characterize local degradation response and uncertainty, which guide adaptive temporal-detail fusion in a neural architecture. Additional components include a structure-color collaborative reconstruction head, a heteroscedastic objective, and overlap-tiled inference to enable memory-efficient real-time 4K processing. Experiments on UVG and RealisVideo-4K datasets claim competitive restoration fidelity with fewer parameters than prior methods, significant speedup for full-resolution 4K inference, robustness across multi-level mixed degradations, and generalization from synthetic training to real phone-captured 4K video noise, with benefits for downstream tasks like object detection.

Significance. If the central claims hold with supporting quantitative evidence, the work would offer a meaningful advance in efficient, uncertainty-aware UHD video denoising by combining explicit GP modeling with neural reconstruction, potentially improving parameter efficiency and cross-degradation robustness over purely implicit learning approaches.

major comments (2)
  1. [Abstract] Abstract: the headline claims of competitive fidelity, substantially fewer parameters, real-time 4K inference with speedup, and robust generalization across mixed degradations are asserted without any quantitative tables, metrics, error bars, or ablation results in the provided text, preventing verification of the performance advantages.
  2. [Method] Method (GP guidance description): the assertion that sparse GP posterior mean/variance statistics over spatio-temporal descriptors explicitly guide adaptive temporal-detail fusion without introducing temporal inconsistency or high-frequency artifacts lacks any derivation, kernel specification, or stability analysis under the heteroscedastic objective and overlap-tiled inference; this mechanism is load-bearing for attributing the claimed parameter efficiency and generalization to the GP component rather than the base network.
minor comments (1)
  1. [Abstract] Abstract: the final sentence on the real-world phone-captured study is informative but could be tightened to focus on the core technical contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on our submission. Below, we provide point-by-point responses to the major comments and indicate the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claims of competitive fidelity, substantially fewer parameters, real-time 4K inference with speedup, and robust generalization across mixed degradations are asserted without any quantitative tables, metrics, error bars, or ablation results in the provided text, preventing verification of the performance advantages.

    Authors: The abstract serves as a concise summary of the paper's contributions and results. The quantitative tables, metrics (such as PSNR and SSIM), error bars, ablation results, and comparisons demonstrating competitive fidelity, fewer parameters, real-time 4K inference, and generalization are all present in the Experiments section and supplementary material of the full manuscript. This allows readers to verify the claims. If the submission process led to only the abstract being reviewed, we can ensure the full document is provided. No changes to the abstract text are necessary as tables are not typically included there. revision: no

  2. Referee: [Method] Method (GP guidance description): the assertion that sparse GP posterior mean/variance statistics over spatio-temporal descriptors explicitly guide adaptive temporal-detail fusion without introducing temporal inconsistency or high-frequency artifacts lacks any derivation, kernel specification, or stability analysis under the heteroscedastic objective and overlap-tiled inference; this mechanism is load-bearing for attributing the claimed parameter efficiency and generalization to the GP component rather than the base network.

    Authors: The referee correctly identifies that the current description of the GP guidance could be more detailed. We will revise the Method section to include: (1) the specific kernel specification (e.g., the spatio-temporal RBF kernel), (2) a derivation of the sparse GP posterior mean and variance over the descriptors, and (3) a stability analysis under the heteroscedastic objective and overlap-tiled inference, including discussion on why temporal inconsistency and high-frequency artifacts are avoided. This will better support the claims regarding parameter efficiency and generalization attributable to the GP component. We appreciate this suggestion for improving clarity. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation builds on standard GP and network components without self-referential reduction

full rationale

The paper presents UHD-GPGNet as a framework that estimates sparse GP posterior statistics over spatio-temporal descriptors to guide adaptive fusion, using a structure-color reconstruction head, heteroscedastic objective, and overlap-tiled inference. No equations or derivations in the provided abstract or description reduce the claimed performance (fidelity, real-time 4K, robustness) directly to fitted parameters or prior self-citations by construction. The method is explicitly described as combining established GP posterior computation with neural components, without renaming known results, smuggling ansatzes via self-citation, or treating fitted inputs as predictions. The central claim remains independent of any load-bearing self-referential step, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only access prevents identification of specific free parameters, axioms, or invented entities; the GP component likely inherits standard kernel and posterior assumptions but details are absent.

pith-pipeline@v0.9.0 · 5545 in / 1080 out tokens · 28739 ms · 2026-05-10T16:38:58.154372+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages

  1. [1]

    Tim Brooks, Ben Mildenhall, Tianfan Xue, Jiawen Chen, Dillon Sharlet, and Jonathan T Barron. 2019. Unprocessing images for learned raw denoising. In Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition. 11036–11045

  2. [2]

    Kelvin CK Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. 2021. Basicvsr: The search for essential components in video super-resolution and beyond. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition. 4947–4956

  3. [3]

    Kelvin CK Chan, Shangchen Zhou, Xiangyu Xu, and Chen Change Loy. 2022. Basicvsr++: Improv- ing video super-resolution with enhanced propaga- tion and alignment. InProceedings of the IEEE/CVF 9 conference on computer vision and pattern recogni- tion. 5972–5981

  4. [4]

    Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. 2022. Simple baselines for image restora- tion. InEuropean Conference on Computer Vision. Springer, 17–33

  5. [5]

    Yuning Cui, Wenqi Ren, Xiaochun Cao, and Alois Knoll. 2024. Revitalizing convolutional network for image restoration.IEEE Transactions on Pattern Analysis and Machine Intelligence46, 12 (2024), 9423–9438

  6. [6]

    Zixuan Fu, Lanqing Guo, Chong Wang, Yufei Wang, Zhihao Li, and Bihan Wen. 2024. Temporal as a plu- gin: Unsupervised video denoising with pre-trained image denoisers. InEuropean Conference on Com- puter Vision. Springer, 349–367

  7. [7]

    Amirhosein Ghasemabadi, Muhammad K Janjua, Mohammad Salameh, and Di Niu. 2024. Learning truncated causal history model for video restoration. Advances in Neural Information Processing Systems 37 (2024), 27584–27615

  8. [8]

    Hang Guo, Yong Guo, Yaohua Zha, Yulun Zhang, Wenbo Li, Tao Dai, Shu-Tao Xia, and Yawei Li

  9. [9]

    InProceedings of the Computer Vision and Pattern Recognition Conference

    Mambairv2: Attentive state space restoration. InProceedings of the Computer Vision and Pattern Recognition Conference. 28124–28133

  10. [10]

    Shi Guo, Zifei Yan, Kai Zhang, Wangmeng Zuo, and Lei Zhang. 2019. Toward convolutional blind de- noising of real photographs. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1712–1722

  11. [11]

    Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi. 2023. Neighborhood attention trans- former. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6185– 6194

  12. [12]

    Xin Jin, Simon Niklaus, Zhoutong Zhang, Zhihao Xia, Chunle Guo, Yuting Yang, Jiawen Chen, and Chongyi Li. 2025. Classic video denoising in a ma- chine learning world: Robust, fast, and controllable. InProceedings of the Computer Vision and Pattern Recognition Conference. 2084–2093

  13. [13]

    Alex Kendall and Yarin Gal. 2017. What uncer- tainties do we need in Bayesian deep learning for computer vision?. InAdvances in Neural Informa- tion Processing Systems, Vol. 30

  14. [14]

    Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and scalable predic- tive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems, Vol. 30

  15. [15]

    ChongyiLi, Chun-LeGuo, Ling-HaoHan, JunJiang, Ming-Ming Cheng, Jinwei Gu, and Chen Change Loy. 2023. Embedding fourier for ultra-high- definition low-light image enhancement. InInterna- tional Conference on Learning Representations

  16. [16]

    Dasong Li, Xiaoyu Zhang, Kang Hu, Wenjun Wang, Yanwei Lu, Jiahao Liu, Jie Li, Siwei Ma, and Wen Gao. 2023. A simple baseline for video restoration with grouped spatial-temporal shift. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9822–9832

  17. [17]

    Junyi Li, Xiaohe Wu, Zhenxing Niu, and Wangmeng Zuo. 2022. Unidirectional video denoising by mim- icking backward recurrent modules with look-ahead forward ones. InEuropean Conference on Computer Vision. Springer, 592–609

  18. [18]

    Jingyun Liang, Jiezhang Cao, Yuchen Fan, Kai Zhang, RakeshRanjan, Yawei Li, Radu Timofte, and Luc Van Gool. 2024. Vrt: A video restoration trans- former.IEEE Transactions on Image Processing33 (2024), 2171–2182

  19. [19]

    Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. 2021. SwinIR: Image restoration using swin transformer. InProceedings of the IEEE/CVF International Con- ference on Computer Vision Workshops. 1833–1844

  20. [20]

    Jingyun Liang, Yuchen Fan, Xiaoyu Xiang, Rakesh Ranjan, Eddy Ilg, Simon Green, Jiezhang Cao, Kai Zhang, Radu Timofte, and Luc V Gool. 2022. Recur- rent video restoration transformer with guided de- formable attention.Advances in Neural Information Processing Systems35 (2022), 378–393

  21. [21]

    Lydia Lindner, Alexander Effland, Filip Ilic, Thomas Pock, and Erich Kobler. 2023. Lightweight video de- noising using aggregated shifted window attention. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 351–360

  22. [22]

    Matteo Maggioni, Yibin Huang, Cheng Li, Shuai Xiao, Zhongqian Fu, and Fenglong Song. 2021. Efficient multi-stage video denoising with recur- rent spatio-temporal fusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3466–3475

  23. [23]

    Yudong Mao, Hao Luo, Zhiwei Zhong, Peilin Chen, Zhijiang Zhang, and Shiqi Wang. 2025. Making old film great again: Degradation-aware state space model for old film restoration. InProceedings of the Computer Vision and Pattern Recognition Confer- ence. 28039–28049

  24. [24]

    Alexandre Mercat, Marko Viitanen, and Jarno Vanne. 2020. UVG dataset: 50/120fps 4K sequences 10 for video codec analysis and development. InPro- ceedings of the 11th ACM multimedia systems con- ference. 297–302

  25. [25]

    Kristina Monakhova, Stephan R Richter, Laura Waller, and Vladlen Koltun. 2022. Dancing under the stars: video denoising in starlight. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16241–16251

  26. [26]

    David A Nix and Andreas S Weigend. 1994. Esti- mating the mean and variance of the target prob- ability distribution. InProceedings of the IEEE In- ternational Conference on Neural Networks, Vol. 1. IEEE, 55–60

  27. [27]

    Chenyang Qi, Junming Chen, Xin Yang, and Qifeng Chen. 2022. Real-time streaming video denois- ing with bidirectional buffers. InProceedings of the 30th ACM International Conference on Multimedia. 2758–2766

  28. [28]

    Joaquin Quinonero-Candela and Carl Edward Ras- mussen. 2005. A unifying view of sparse approxi- mate Gaussian process regression.Journal of ma- chine learning research6, Dec (2005), 1939–1959

  29. [29]

    Edward Snelson and Zoubin Ghahramani. 2005. Sparse Gaussian processes using pseudo-inputs.Ad- vances in neural information processing systems18 (2005)

  30. [30]

    Fastdvdnet: Towards real-time deep video denois- ing without flow estimation

    MatiasTassano, JulieDelon, andThomasVeit.2020. Fastdvdnet: Towards real-time deep video denois- ing without flow estimation. InProceedings of the IEEE/CVF conference on computer vision and pat- tern recognition. 1354–1363

  31. [31]

    Michalis Titsias. 2009. Variational learning of induc- ing variables in sparse Gaussian processes. InArtifi- cial intelligence and statistics. PMLR, 567–574

  32. [32]

    Gregory Vaksman, Michael Elad, and Peyman Mi- lanfar. 2021. Patch craft: Video denoising by deep modeling and patch matching. InProceedings of the IEEE/CVF International Conference on Computer Vision. 2157–2166

  33. [33]

    Hao Wang, Zhuoran Zheng, Wenqi Ren, and Xi- aochun Cao. 2024. Towards efficient UHD image restoration: A benchmark and an efficient method. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

  34. [34]

    Edvr: Videorestoration with enhanced deformable convolutional networks

    Xintao Wang, Kelvin CK Chan, Ke Yu, Chao Dong, andChenChangeLoy.2019. Edvr: Videorestoration with enhanced deformable convolutional networks. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition workshops. 0–0

  35. [35]

    Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. 2021. Real-ESRGAN: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF International Confer- ence on Computer Vision Workshops. 1905–1914

  36. [36]

    2006.Gaussian processes for machine learn- ing

    Christopher KI Williams and Carl Edward Ras- mussen. 2006.Gaussian processes for machine learn- ing. Vol. 2. MIT press Cambridge, MA

  37. [37]

    Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, and Eric P Xing. 2016. Deep ker- nel learning. InArtificial intelligence and statistics. PMLR, 370–378

  38. [38]

    Fanghua Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, and Chao Dong. 2024. Scaling up to excel- lence: Practicing model scaling for photo-realistic image restoration in the wild. InProceedings of the IEEE/CVF conference on computer vision and pat- tern recognition. 25669–25680

  39. [39]

    Huanjing Yue, Cong Cao, Lei Liao, Ronghe Chu, and Jingyu Yang. 2020. Supervised raw video denoising withabenchmarkdatasetondynamicscenes.InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2301–2310

  40. [40]

    HuanjingYue, CongCao, LeiLiao, andJingyuYang

  41. [41]

    Rvideformer: Efficient raw video denoising transformer with a larger benchmark dataset.IEEE Transactions on Circuits and Systems for Video Technology(2025)

  42. [42]

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming- Hsuan Yang. 2022. Restormer: Efficient transformer for high-resolution image restoration. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5728–5739

  43. [43]

    Gengchen Zhang, Yulun Zhang, Xin Yuan, and Ying Fu. 2024. Binarized low-light raw video enhance- ment. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition. 25753–25762

  44. [44]

    Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timofte. 2021. Designing a practical degradation model for deep blind image super-resolution. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision. 4791–4800

  45. [45]

    Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. 2017. Beyond a Gaussian de- noiser: Residual learning of deep CNN for image de- noising.IEEE Transactions on Image Processing26, 7 (2017), 3142–3155. 11

  46. [46]

    Yi Zhang, Hongwei Qin, Xiaogang Wang, and Hong- sheng Li. 2021. Rethinking noise synthesis and modeling in raw denoising. InProceedings of the IEEE/CVF International Conference on Computer Vision. 4593–4601

  47. [47]

    Weisong Zhao, Jingkai Zhou, Xiangyu Zhu, Wei- hua Chen, Xiao-Yu Zhang, Zhen Lei, and Fan Wang. 2025. RealisVSR: Detail-enhanced Diffu- sion for Real-World 4K Video Super-Resolution. arXiv:2507.19138 [eess.IV]https://arxiv.org/ abs/2507.19138

  48. [48]

    Zhuoran Zheng, Wenqi Ren, Xiaochun Cao, Xi- aobin Hu, Tao Wang, Fenglong Song, and Xiuyi Jia. 2021. Ultra-high-definition image dehazing via multi-guided bilateral learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16185–16194. 12