Uncertainty-Aware 4D Gaussian Splatting for Monocular Occluded Human Rendering

Feifei Shao; Jun Xiao; Lin Li; Long Chen; Weiquan Wang; Zhen Wang

arxiv: 2602.06343 · v2 · submitted 2026-02-06 · 💻 cs.CV

Uncertainty-Aware 4D Gaussian Splatting for Monocular Occluded Human Rendering

Weiquan Wang , Feifei Shao , Lin Li , Zhen Wang , Jun Xiao , Long Chen This is my paper

Pith reviewed 2026-05-16 07:32 UTC · model grok-4.3

classification 💻 cs.CV

keywords 4D Gaussian Splattingmonocular human renderingocclusion handlinguncertainty estimationprobabilistic deformationdynamic scene renderingconfidence-aware regularization

0 comments

The pith

Modeling observation uncertainty in 4D Gaussian splatting enables robust rendering of occluded humans from monocular videos.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Rendering dynamic humans from a single video view breaks down when body parts are hidden. The paper treats the problem as finding the most probable 3D scene given inputs with varying levels of noise. It adds a probabilistic deformation network and a joint rasterization step that produces a per-pixel uncertainty map. This map automatically reduces the influence of bad observations during training and guides regularizations that keep geometry consistent across space and time where visual cues are missing. Experiments on standard occluded-motion datasets show improved fidelity and stability over prior methods.

Core claim

By reformulating monocular occluded human rendering as a maximum a posteriori estimation problem under heteroscedastic observation noise, U-4DGS integrates a Probabilistic Deformation Network and a Joint Rasterization pipeline. This architecture renders pixel-aligned uncertainty maps that act as an adaptive gradient modulator, automatically attenuating artifacts from unreliable observations. Confidence-Aware Regularizations then leverage the learned uncertainty to selectively propagate spatial-temporal validity and prevent geometric drift in regions lacking reliable visual cues.

What carries the argument

Pixel-aligned uncertainty maps produced by the Joint Rasterization pipeline, which modulate gradients adaptively and inform Confidence-Aware Regularizations that selectively enforce spatial-temporal consistency.

If this is right

Unreliable observations produce fewer artifacts because their gradients are automatically down-weighted.
Geometric drift is reduced in occluded regions through uncertainty-guided propagation of spatial-temporal validity.
Rendering quality and temporal stability improve on datasets containing natural occlusions.
The same uncertainty mechanism can be applied to other 4D Gaussian splatting tasks with incomplete observations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be adapted to static scene reconstruction where parts of the environment are temporarily hidden.
Single-camera capture pipelines for animation or VR might become more practical if uncertainty handling reduces the need for multiple synchronized views.
Similar per-pixel uncertainty outputs could be tested on non-human dynamic objects such as animals or vehicles in monocular video.

Load-bearing premise

The learned uncertainty maps correctly flag unreliable observations so the regularizations can stop drift in hidden areas without creating new artifacts or over-smoothing visible parts.

What would settle it

A side-by-side comparison of rendered outputs against ground-truth geometry in heavily occluded frames from the ZJU-MoCap dataset would show whether uncertainty modulation reduces errors relative to versions without the uncertainty maps.

Figures

Figures reproduced from arXiv: 2602.06343 by Feifei Shao, Jun Xiao, Lin Li, Long Chen, Weiquan Wang, Zhen Wang.

**Figure 1.** Figure 1: Performance, Fidelity, and Stability. (a) Our U-4DGS achieves the best trade-off between rendering quality and training [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: The framework of U-4DGS. Left: The Probabilistic Deformation Network conditions Canonical Gaussians on time embedding 𝛾 (𝑡) and pose 𝜃𝑡 to predict geometric offsets (Δr, Δ𝜇, Δs) alongside per-primitive aleatoric uncertainty 𝜎. Middle: The deformed Gaussians are transformed via LBS and rendered through a Joint Rasterization pipeline, simultaneously producing a photometric image and a pixel-aligned Uncertain… view at source ↗

**Figure 3.** Figure 3: Qualitative comparisons on novel view synthesis. Left: Results on the ZJU-MoCap dataset with synthetic occlusions. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative ablation study. Standard Human Rendering. To demonstrate the impact of occlusion on conventional pipelines, we evaluate HumanNeRF [39], GaussianAvatar [8], and GauHuman [9]. These methods are trained directly on the occluded sequences without specific handling mechanisms. (2)Occlusion-Aware Approaches. We compare against representative methods covering three mainstream technical paradigms: a… view at source ↗

**Figure 5.** Figure 5: Visualization of the learned Uncertainty Map. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

High-fidelity rendering of dynamic humans from monocular videos typically degrades catastrophically under occlusions. Existing solutions incorporate external priors-either hallucinating missing content via generative models, which induces severe temporal flickering, or imposing rigid geometric heuristics that fail to capture diverse appearances. To this end, we reformulate the task as a Maximum A Posteriori estimation problem under heteroscedastic observation noise. In this paper, we propose U-4DGS, a framework integrating a Probabilistic Deformation Network and a Joint Rasterization pipeline. This architecture renders pixel-aligned uncertainty maps that act as an adaptive gradient modulator, automatically attenuating artifacts from unreliable observations. Furthermore, to prevent geometric drift in regions lacking reliable visual cues, we enforce Confidence-Aware Regularizations, which leverage the learned uncertainty to selectively propagate spatial-temporal validity. Extensive experiments on the ZJU-MoCap and OcMotion datasets demonstrate that U-4DGS achieves state-of-the-art rendering fidelity and robustness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper adds learned uncertainty to 4D Gaussian Splatting via a probabilistic deformation network and joint rasterization to handle occlusions better, but the evidence that the uncertainty actually isolates unreliable observations is still moderate.

read the letter

The main thing here is that U-4DGS reformulates monocular occluded human rendering as heteroscedastic MAP estimation inside 4DGS. It adds a Probabilistic Deformation Network and a joint rasterization step that produces pixel-aligned uncertainty maps to modulate gradients, then uses confidence-aware regularizations to keep geometry stable where observations are weak. This is a clear step past the generative priors that flicker and the rigid heuristics that ignore appearance variation.

Referee Report

2 major / 2 minor

Summary. The paper claims that monocular occluded human rendering can be improved by reformulating the problem as MAP estimation under heteroscedastic noise. It introduces U-4DGS, which combines a Probabilistic Deformation Network with a Joint Rasterization pipeline to produce pixel-aligned uncertainty maps that modulate gradients during optimization, plus Confidence-Aware Regularizations that use these maps to propagate spatial-temporal validity and prevent drift in occluded regions. Experiments on ZJU-MoCap and OcMotion datasets are reported to achieve state-of-the-art rendering fidelity and robustness without external generative priors or rigid heuristics.

Significance. If the uncertainty maps reliably isolate occlusion effects and the regularizations selectively stabilize geometry without over-smoothing, the work would advance 4D Gaussian Splatting by providing a data-driven mechanism for handling unreliable observations in dynamic human reconstruction. This could reduce reliance on generative hallucination or hand-crafted constraints, improving temporal consistency in real-world monocular capture scenarios.

major comments (2)

[Methods (Probabilistic Deformation Network and Joint Rasterization)] The MAP reformulation under heteroscedastic noise (abstract and methods) assumes the uncertainty head from Joint Rasterization produces maps that correctly down-weight occluded observations during photometric optimization. However, training uses only photometric losses plus the proposed regularizations with no explicit uncertainty supervision or occlusion masks mentioned; this risks the head converging to a trivial or correlated solution, undermining the adaptive gradient modulation and selective propagation claims.
[Experiments and Results] The SOTA claims on ZJU-MoCap and OcMotion rest on reported fidelity and robustness improvements, yet the abstract and results provide no quantitative error bars, standard deviations across runs, or detailed ablation tables isolating the contribution of the uncertainty maps versus the regularizations. This makes it difficult to assess whether the gains are statistically significant or robust to the central assumption.

minor comments (2)

[Methods] Notation for the uncertainty maps and the heteroscedastic noise model should be introduced with explicit equations early in the methods to clarify how the variance term enters the loss.
[Figures] Figure captions for uncertainty visualizations should include quantitative metrics (e.g., correlation with ground-truth occlusion) rather than qualitative examples alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify key aspects of our MAP formulation and experimental validation. We address each major point below and have revised the manuscript to strengthen the presentation of our approach.

read point-by-point responses

Referee: [Methods (Probabilistic Deformation Network and Joint Rasterization)] The MAP reformulation under heteroscedastic noise (abstract and methods) assumes the uncertainty head from Joint Rasterization produces maps that correctly down-weight occluded observations during photometric optimization. However, training uses only photometric losses plus the proposed regularizations with no explicit uncertainty supervision or occlusion masks mentioned; this risks the head converging to a trivial or correlated solution, undermining the adaptive gradient modulation and selective propagation claims.

Authors: The uncertainty head is trained end-to-end as part of the heteroscedastic negative log-likelihood objective, where the per-pixel photometric loss is scaled inversely by the predicted uncertainty. This formulation, standard in probabilistic deep learning, naturally drives higher uncertainty predictions for pixels that cannot be explained well by the current model (e.g., occluded regions), without requiring explicit masks or supervision. The Confidence-Aware Regularizations further constrain the uncertainty field to be spatially and temporally coherent, mitigating the risk of trivial solutions such as uniform high uncertainty. We have added a derivation of the modulated gradient and additional uncertainty map visualizations in the revised methods and experiments sections to illustrate that the learned maps align with occlusion patterns. revision: partial
Referee: [Experiments and Results] The SOTA claims on ZJU-MoCap and OcMotion rest on reported fidelity and robustness improvements, yet the abstract and results provide no quantitative error bars, standard deviations across runs, or detailed ablation tables isolating the contribution of the uncertainty maps versus the regularizations. This makes it difficult to assess whether the gains are statistically significant or robust to the central assumption.

Authors: We agree that additional statistical reporting and component-wise ablations would strengthen the claims. In the revised manuscript we now report mean and standard deviation over three independent runs for all metrics in Tables 1 and 2. We have also expanded the ablation study (new Table 4 and supplementary figures) to isolate the uncertainty-modulated joint rasterization from the confidence-aware regularizations, confirming that both contribute measurably, with the uncertainty maps providing the largest gain on occluded sequences. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper reformulates monocular occluded human rendering as MAP estimation under heteroscedastic noise, then introduces a Probabilistic Deformation Network plus Joint Rasterization to output pixel-aligned uncertainty maps that modulate gradients, followed by Confidence-Aware Regularizations that use those maps. These are learned auxiliary outputs applied downstream rather than quantities defined in terms of the final rendering metric by construction. No equation reduces the claimed SOTA fidelity on ZJU-MoCap or OcMotion to a self-referential fit or self-citation chain; the central claims rest on empirical results and the architectural integration rather than tautological redefinitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard probabilistic modeling assumptions and the 4D Gaussian Splatting representation; no new physical entities are postulated and no free parameters are explicitly named in the abstract.

axioms (1)

domain assumption Observation noise in monocular human video is heteroscedastic and can be captured by a learned per-pixel uncertainty map.
Invoked in the MAP estimation reformulation and used to modulate gradients and regularizations.

pith-pipeline@v0.9.0 · 5475 in / 1291 out tokens · 49568 ms · 2026-05-16T07:32:01.854953+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis
cs.CV 2026-04 unverdicted novelty 8.0

DF3DV-1K supplies 1,048 scenes with clean and cluttered image pairs plus a challenging 41-scene subset to benchmark and improve distractor-free radiance field methods.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Jeongmin Bae, Seoha Kim, Youngsik Yun, Hahyun Lee, Gun Bang, and Youngjung Uh. 2024. Per-gaussian embedding-based deformation for deformable 3d gaussian splatting. InEuropean Conference on Computer Vision. Springer, 321–335

work page 2024
[2]

Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015. High-quality streamable free-viewpoint video.ACM Transactions on Graphics (ToG)34, 4 (2015), 1–13

work page 2015
[3]

Jinlong Fan, Shanshan Zhao, Liang Zheng, Jing Zhang, Yuxiang Yang, and Ming- ming Gong. 2026. InpaintHuman: Reconstructing Occluded Humans with Multi- Scale UV Mapping and Identity-Preserving Diffusion Inpainting.arXiv preprint arXiv:2601.02098(2026)

work page arXiv 2026
[4]

Ben Fei, Jingyi Xu, Rui Zhang, Qingyuan Zhou, Weidong Yang, and Ying He. 2024. 3d gaussian splatting as new era: A survey.IEEE Transactions on Visualization and Computer Graphics(2024)

work page 2024
[5]

Guofeng Feng, Siyan Chen, Rong Fu, Zimu Liao, Yi Wang, Tao Liu, Boni Hu, Linning Xu, Zhilin Pei, Hengjie Li, et al . 2025. Flashgs: Efficient 3d gaussian splatting for large-scale and high-resolution rendering. InProceedings of the Computer Vision and Pattern Recognition Conference. 26652–26662

work page 2025
[6]

Lily Goli, Cody Reading, Silvia Sellán, Alec Jacobson, and Andrea Tagliasacchi

work page
[7]

InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Bayes’ rays: Uncertainty quantification for neural radiance fields. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20061–20070

work page
[8]

Fengzhi Guo, Chih-Chuan Hsu, Sihao Ding, and Cheng Zhang. 2025. Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction.arXiv preprint arXiv:2510.12768(2025)

work page arXiv 2025
[9]

Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang, Boyao Zhou, Boning Liu, Sheng- ping Zhang, and Liqiang Nie. 2024. Gaussianavatar: Towards realistic human avatar modeling from a single video via animatable 3d gaussians. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 634–644

work page 2024
[10]

Shoukang Hu, Tao Hu, and Ziwei Liu. 2024. Gauhuman: Articulated gaussian splatting from monocular human videos. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 20418–20431

work page 2024
[11]

Buzhen Huang, Yuan Shu, Jingyi Ju, and Yangang Wang. 2022. Occluded human body capture with self-supervised spatial-temporal motion prior.arXiv preprint arXiv:2207.05375(2022)

work page arXiv 2022
[12]

Zeng Huang, Yuanlu Xu, Christoph Lassner, Hao Li, and Tony Tung. 2020. Arch: Animatable reconstruction of clothed humans. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3093–3102

work page 2020
[13]

Zekai Jiang, Tong Duan, and Dongyu Zhang. 2025. SymGaussian: Occluded Human Rendering with Multi-scale Symmetry Feature from Monocular Video. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5

work page 2025
[14]

Alex Kendall and Yarin Gal. 2017. What uncertainties do we need in bayesian deep learning for computer vision?Advances in neural information processing systems30 (2017)

work page 2017
[15]

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis

work page
[16]

Graph.42, 4 (2023), 139–1

3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph.42, 4 (2023), 139–1

work page 2023
[17]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Opti- mization. In3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2015
[18]

Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, and Anurag Ranjan. 2024. Hugs: Human gaussian splats. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 505–515

work page 2024
[19]

Inhee Lee, Byungjun Kim, and Hanbyul Joo. 2024. Guess the unseen: Dynamic 3d scene reconstruction from partial 2d glimpses. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1062–1071

work page 2024
[20]

Sibaek Lee, Kyeongsu Kang, Seongbo Ha, and Hyeonwoo Yu. 2025. Bayesian NeRF: Quantifying uncertainty with volume density for neural implicit fields. IEEE Robotics and Automation Letters10, 3 (2025), 2144–2151

work page 2025
[21]

Jiahui Lei, Yufu Wang, Georgios Pavlakos, Lingjie Liu, and Kostas Daniilidis

work page
[22]

InProceedings of the IEEE/CVF conference on computer vision and pattern recognition

Gart: Gaussian articulated template models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 19876–19887

work page
[23]

Chen Li, Jiahao Lin, and Gim Hee Lee. 2024. Ghunerf: Generalizable human nerf from a monocular video. In2024 International Conference on 3D Vision (3DV). IEEE, 923–932

work page 2024
[24]

Deqi Li, Shi-Sheng Huang, Zhiyuan Lu, Xinran Duan, and Hua Huang. 2024. St-4dgs: Spatial-temporally consistent 4d gaussian splatting for efficient dynamic scene rendering. InACM SIGGRAPH 2024 Conference Papers. 1–11

work page 2024
[25]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. 2015. SMPL: a skinned multi-person linear model.ACM Trans- actions on Graphics (TOG)34, 6 (2015), 1–16

work page 2015
[26]

Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. 2024. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In2024 International Conference on 3D Vision (3DV). IEEE, 800–809

work page 2024
[27]

Ricardo Martin-Brualla, Noha Radwan, Mehdi SM Sajjadi, Jonathan T Barron, Alexey Dosovitskiy, and Daniel Duckworth. 2021. Nerf in the wild: Neural radiance fields for unconstrained photo collections. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7210–7219

work page 2021
[28]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems32 (2019)

work page 2019
[29]

Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Xiaowei Zhou, and Hujun Bao. 2021. Animatable neural radiance fields for modeling dynamic human bodies. InProceedings of the IEEE/CVF international conference on computer vision. 14314–14323

work page 2021
[30]

Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. 2021. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9054–9063

work page 2021
[31]

Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, and Siyu Tang

work page
[32]

In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5020–5030

work page
[33]

Weining Ren, Zihan Zhu, Boyang Sun, Jiaqi Chen, Marc Pollefeys, and Songyou Peng. 2024. Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8931–8940

work page 2024
[34]

Sara Sabour, Suhani Vora, Daniel Duckworth, Ivan Krasin, David J Fleet, and Andrea Tagliasacchi. 2023. Robustnerf: Ignoring distractors with robust losses. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 20626–20636

work page 2023
[35]

Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, and Zeyu Wang. 2024. Splattingavatar: Realistic real-time human avatars with mesh-embedded gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1606–1616

work page 2024
[36]

Zhuo Su, Lan Xu, Zerong Zheng, Tao Yu, Yebin Liu, and Lu Fang. 2020. Robust- fusion: Human volumetric capture with data-driven visual cues using a rgbd camera. InEuropean Conference on Computer Vision. Springer, 246–264

work page 2020
[37]

Adam Sun, Tiange Xiang, Scott Delp, Li Fei-Fei, and Ehsan Adeli. 2024. Occfusion: Rendering occluded humans with generative diffusion priors.Advances in neural information processing systems37 (2024), 92184–92209

work page 2024
[38]

Niko Sünderhauf, Jad Abou-Chakra, and Dimity Miller. 2023. Density-aware NeRF Ensembles: Quantifying Predictive Uncertainty in Neural Radiance Fields. In2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 9370–9376

work page 2023
[39]

Gusi Te, Xiu Li, Xiao Li, Jinglu Wang, Wei Hu, and Yan Lu. 2022. Neural capture of animatable 3d human from monocular video. InEuropean Conference on Computer Vision. Springer, 275–291. xxxx, xx, xx Weiquan Wang, Feifei Shao, Lin Li, Zhen Wang, Jun Xiao, and Long Chen

work page 2022
[40]

Yating Tian, Hongwen Zhang, Yebin Liu, and Limin Wang. 2023. Recovering 3d human mesh from monocular images: A survey.IEEE transactions on pattern analysis and machine intelligence45, 12 (2023), 15406–15425

work page 2023
[41]

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing13, 4 (2004), 600–612

work page 2004
[42]

Jing Wen, Xiaoming Zhao, Zhongzheng Ren, Alexander G Schwing, and Shenlong Wang. 2024. Gomavatar: Efficient animatable human modeling from monocular video using gaussians-on-mesh. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2059–2069

work page 2024
[43]

Chung-Yi Weng, Brian Curless, Pratul P Srinivasan, Jonathan T Barron, and Ira Kemelmacher-Shlizerman. 2022. Humannerf: Free-viewpoint rendering of moving people from monocular video. InProceedings of the IEEE/CVF conference on computer vision and pattern Recognition. 16210–16220

work page 2022
[44]

Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 2024. 4d gaussian splatting for real-time dynamic scene rendering. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 20310–20320

work page 2024
[45]

Tiange Xiang, Adam Sun, Scott Delp, Kazuki Kozuka, Li Fei-Fei, and Ehsan Adeli. 2025. Rendering Humans behind Occlusions.IEEE Transactions on Pattern Analysis and Machine Intelligence(2025)

work page 2025
[46]

Tiange Xiang, Adam Sun, Jiajun Wu, Ehsan Adeli, and Li Fei-Fei. 2023. Rendering humans from object-occluded monocular videos. InProceedings of the IEEE/CVF International Conference on Computer Vision. 3239–3250

work page 2023
[47]

Shuo Yang, Xiaoling Gu, Zhenzhong Kuang, Feiwei Qin, and Zizhao Wu. 2025. Innovative AI techniques for photorealistic 3D clothed human reconstruction from monocular images or videos: a survey.The Visual Computer41, 6 (2025), 3973–4000

work page 2025
[48]

Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. 2024. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 20331–20341

work page 2024
[49]

Jingrui Ye, Zhongkai Zhang, and Qingmin Liao. 2025. Occgaussian: 3d gaussian splatting for occluded human rendering. InProceedings of the 2025 International Conference on Multimedia Retrieval. 1710–1719

work page 2025
[50]

Tao Yu, Zerong Zheng, Kaiwen Guo, Pengpeng Liu, Qionghai Dai, and Yebin Liu. 2021. Function4d: Real-time human volumetric capture from very sparse consumer rgbd sensors. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5746–5756

work page 2021
[51]

Zhengming Yu, Wei Cheng, Xian Liu, Wayne Wu, and Kwan-Yee Lin. 2023. Mono- human: Animatable human neural field from monocular video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16943– 16953

work page 2023
[52]

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang

work page
[53]

InProceedings of the IEEE conference on computer vision and pattern recognition

The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition. 586–595

work page
[54]

Xinjie Zhang, Zhening Liu, Yifan Zhang, Xingtong Ge, Dailan He, Tongda Xu, Yan Wang, Zehong Lin, Shuicheng Yan, and Jun Zhang. 2025. Mega: Memory- efficient 4d gaussian splatting for dynamic scenes. InProceedings of the IEEE/CVF International Conference on Computer Vision. 27828–27838

work page 2025
[55]

Yiqun Zhao, Chenming Wu, Binbin Huang, Yihao Zhi, Chen Zhao, Jingdong Wang, and Shenghua Gao. 2025. Surfel-based Gaussian inverse rendering for fast and relightable dynamic human reconstruction from monocular videos.IEEE Transactions on Pattern Analysis and Machine Intelligence(2025)

work page 2025

[1] [1]

Jeongmin Bae, Seoha Kim, Youngsik Yun, Hahyun Lee, Gun Bang, and Youngjung Uh. 2024. Per-gaussian embedding-based deformation for deformable 3d gaussian splatting. InEuropean Conference on Computer Vision. Springer, 321–335

work page 2024

[2] [2]

Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015. High-quality streamable free-viewpoint video.ACM Transactions on Graphics (ToG)34, 4 (2015), 1–13

work page 2015

[3] [3]

Jinlong Fan, Shanshan Zhao, Liang Zheng, Jing Zhang, Yuxiang Yang, and Ming- ming Gong. 2026. InpaintHuman: Reconstructing Occluded Humans with Multi- Scale UV Mapping and Identity-Preserving Diffusion Inpainting.arXiv preprint arXiv:2601.02098(2026)

work page arXiv 2026

[4] [4]

Ben Fei, Jingyi Xu, Rui Zhang, Qingyuan Zhou, Weidong Yang, and Ying He. 2024. 3d gaussian splatting as new era: A survey.IEEE Transactions on Visualization and Computer Graphics(2024)

work page 2024

[5] [5]

Guofeng Feng, Siyan Chen, Rong Fu, Zimu Liao, Yi Wang, Tao Liu, Boni Hu, Linning Xu, Zhilin Pei, Hengjie Li, et al . 2025. Flashgs: Efficient 3d gaussian splatting for large-scale and high-resolution rendering. InProceedings of the Computer Vision and Pattern Recognition Conference. 26652–26662

work page 2025

[6] [6]

Lily Goli, Cody Reading, Silvia Sellán, Alec Jacobson, and Andrea Tagliasacchi

work page

[7] [7]

InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Bayes’ rays: Uncertainty quantification for neural radiance fields. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20061–20070

work page

[8] [8]

Fengzhi Guo, Chih-Chuan Hsu, Sihao Ding, and Cheng Zhang. 2025. Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction.arXiv preprint arXiv:2510.12768(2025)

work page arXiv 2025

[9] [9]

Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang, Boyao Zhou, Boning Liu, Sheng- ping Zhang, and Liqiang Nie. 2024. Gaussianavatar: Towards realistic human avatar modeling from a single video via animatable 3d gaussians. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 634–644

work page 2024

[10] [10]

Shoukang Hu, Tao Hu, and Ziwei Liu. 2024. Gauhuman: Articulated gaussian splatting from monocular human videos. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 20418–20431

work page 2024

[11] [11]

Buzhen Huang, Yuan Shu, Jingyi Ju, and Yangang Wang. 2022. Occluded human body capture with self-supervised spatial-temporal motion prior.arXiv preprint arXiv:2207.05375(2022)

work page arXiv 2022

[12] [12]

Zeng Huang, Yuanlu Xu, Christoph Lassner, Hao Li, and Tony Tung. 2020. Arch: Animatable reconstruction of clothed humans. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3093–3102

work page 2020

[13] [13]

Zekai Jiang, Tong Duan, and Dongyu Zhang. 2025. SymGaussian: Occluded Human Rendering with Multi-scale Symmetry Feature from Monocular Video. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5

work page 2025

[14] [14]

Alex Kendall and Yarin Gal. 2017. What uncertainties do we need in bayesian deep learning for computer vision?Advances in neural information processing systems30 (2017)

work page 2017

[15] [15]

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis

work page

[16] [16]

Graph.42, 4 (2023), 139–1

3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph.42, 4 (2023), 139–1

work page 2023

[17] [17]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Opti- mization. In3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2015

[18] [18]

Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, and Anurag Ranjan. 2024. Hugs: Human gaussian splats. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 505–515

work page 2024

[19] [19]

Inhee Lee, Byungjun Kim, and Hanbyul Joo. 2024. Guess the unseen: Dynamic 3d scene reconstruction from partial 2d glimpses. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1062–1071

work page 2024

[20] [20]

Sibaek Lee, Kyeongsu Kang, Seongbo Ha, and Hyeonwoo Yu. 2025. Bayesian NeRF: Quantifying uncertainty with volume density for neural implicit fields. IEEE Robotics and Automation Letters10, 3 (2025), 2144–2151

work page 2025

[21] [21]

Jiahui Lei, Yufu Wang, Georgios Pavlakos, Lingjie Liu, and Kostas Daniilidis

work page

[22] [22]

InProceedings of the IEEE/CVF conference on computer vision and pattern recognition

Gart: Gaussian articulated template models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 19876–19887

work page

[23] [23]

Chen Li, Jiahao Lin, and Gim Hee Lee. 2024. Ghunerf: Generalizable human nerf from a monocular video. In2024 International Conference on 3D Vision (3DV). IEEE, 923–932

work page 2024

[24] [24]

Deqi Li, Shi-Sheng Huang, Zhiyuan Lu, Xinran Duan, and Hua Huang. 2024. St-4dgs: Spatial-temporally consistent 4d gaussian splatting for efficient dynamic scene rendering. InACM SIGGRAPH 2024 Conference Papers. 1–11

work page 2024

[25] [25]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. 2015. SMPL: a skinned multi-person linear model.ACM Trans- actions on Graphics (TOG)34, 6 (2015), 1–16

work page 2015

[26] [26]

Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. 2024. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In2024 International Conference on 3D Vision (3DV). IEEE, 800–809

work page 2024

[27] [27]

Ricardo Martin-Brualla, Noha Radwan, Mehdi SM Sajjadi, Jonathan T Barron, Alexey Dosovitskiy, and Daniel Duckworth. 2021. Nerf in the wild: Neural radiance fields for unconstrained photo collections. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7210–7219

work page 2021

[28] [28]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems32 (2019)

work page 2019

[29] [29]

Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Xiaowei Zhou, and Hujun Bao. 2021. Animatable neural radiance fields for modeling dynamic human bodies. InProceedings of the IEEE/CVF international conference on computer vision. 14314–14323

work page 2021

[30] [30]

Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. 2021. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9054–9063

work page 2021

[31] [31]

Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, and Siyu Tang

work page

[32] [32]

In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5020–5030

work page

[33] [33]

Weining Ren, Zihan Zhu, Boyang Sun, Jiaqi Chen, Marc Pollefeys, and Songyou Peng. 2024. Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8931–8940

work page 2024

[34] [34]

Sara Sabour, Suhani Vora, Daniel Duckworth, Ivan Krasin, David J Fleet, and Andrea Tagliasacchi. 2023. Robustnerf: Ignoring distractors with robust losses. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 20626–20636

work page 2023

[35] [35]

Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, and Zeyu Wang. 2024. Splattingavatar: Realistic real-time human avatars with mesh-embedded gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1606–1616

work page 2024

[36] [36]

Zhuo Su, Lan Xu, Zerong Zheng, Tao Yu, Yebin Liu, and Lu Fang. 2020. Robust- fusion: Human volumetric capture with data-driven visual cues using a rgbd camera. InEuropean Conference on Computer Vision. Springer, 246–264

work page 2020

[37] [37]

Adam Sun, Tiange Xiang, Scott Delp, Li Fei-Fei, and Ehsan Adeli. 2024. Occfusion: Rendering occluded humans with generative diffusion priors.Advances in neural information processing systems37 (2024), 92184–92209

work page 2024

[38] [38]

Niko Sünderhauf, Jad Abou-Chakra, and Dimity Miller. 2023. Density-aware NeRF Ensembles: Quantifying Predictive Uncertainty in Neural Radiance Fields. In2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 9370–9376

work page 2023

[39] [39]

Gusi Te, Xiu Li, Xiao Li, Jinglu Wang, Wei Hu, and Yan Lu. 2022. Neural capture of animatable 3d human from monocular video. InEuropean Conference on Computer Vision. Springer, 275–291. xxxx, xx, xx Weiquan Wang, Feifei Shao, Lin Li, Zhen Wang, Jun Xiao, and Long Chen

work page 2022

[40] [40]

Yating Tian, Hongwen Zhang, Yebin Liu, and Limin Wang. 2023. Recovering 3d human mesh from monocular images: A survey.IEEE transactions on pattern analysis and machine intelligence45, 12 (2023), 15406–15425

work page 2023

[41] [41]

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing13, 4 (2004), 600–612

work page 2004

[42] [42]

Jing Wen, Xiaoming Zhao, Zhongzheng Ren, Alexander G Schwing, and Shenlong Wang. 2024. Gomavatar: Efficient animatable human modeling from monocular video using gaussians-on-mesh. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2059–2069

work page 2024

[43] [43]

Chung-Yi Weng, Brian Curless, Pratul P Srinivasan, Jonathan T Barron, and Ira Kemelmacher-Shlizerman. 2022. Humannerf: Free-viewpoint rendering of moving people from monocular video. InProceedings of the IEEE/CVF conference on computer vision and pattern Recognition. 16210–16220

work page 2022

[44] [44]

Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 2024. 4d gaussian splatting for real-time dynamic scene rendering. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 20310–20320

work page 2024

[45] [45]

Tiange Xiang, Adam Sun, Scott Delp, Kazuki Kozuka, Li Fei-Fei, and Ehsan Adeli. 2025. Rendering Humans behind Occlusions.IEEE Transactions on Pattern Analysis and Machine Intelligence(2025)

work page 2025

[46] [46]

Tiange Xiang, Adam Sun, Jiajun Wu, Ehsan Adeli, and Li Fei-Fei. 2023. Rendering humans from object-occluded monocular videos. InProceedings of the IEEE/CVF International Conference on Computer Vision. 3239–3250

work page 2023

[47] [47]

Shuo Yang, Xiaoling Gu, Zhenzhong Kuang, Feiwei Qin, and Zizhao Wu. 2025. Innovative AI techniques for photorealistic 3D clothed human reconstruction from monocular images or videos: a survey.The Visual Computer41, 6 (2025), 3973–4000

work page 2025

[48] [48]

Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. 2024. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 20331–20341

work page 2024

[49] [49]

Jingrui Ye, Zhongkai Zhang, and Qingmin Liao. 2025. Occgaussian: 3d gaussian splatting for occluded human rendering. InProceedings of the 2025 International Conference on Multimedia Retrieval. 1710–1719

work page 2025

[50] [50]

Tao Yu, Zerong Zheng, Kaiwen Guo, Pengpeng Liu, Qionghai Dai, and Yebin Liu. 2021. Function4d: Real-time human volumetric capture from very sparse consumer rgbd sensors. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5746–5756

work page 2021

[51] [51]

Zhengming Yu, Wei Cheng, Xian Liu, Wayne Wu, and Kwan-Yee Lin. 2023. Mono- human: Animatable human neural field from monocular video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16943– 16953

work page 2023

[52] [52]

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang

work page

[53] [53]

InProceedings of the IEEE conference on computer vision and pattern recognition

The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition. 586–595

work page

[54] [54]

Xinjie Zhang, Zhening Liu, Yifan Zhang, Xingtong Ge, Dailan He, Tongda Xu, Yan Wang, Zehong Lin, Shuicheng Yan, and Jun Zhang. 2025. Mega: Memory- efficient 4d gaussian splatting for dynamic scenes. InProceedings of the IEEE/CVF International Conference on Computer Vision. 27828–27838

work page 2025

[55] [55]

Yiqun Zhao, Chenming Wu, Binbin Huang, Yihao Zhi, Chen Zhao, Jingdong Wang, and Shenghua Gao. 2025. Surfel-based Gaussian inverse rendering for fast and relightable dynamic human reconstruction from monocular videos.IEEE Transactions on Pattern Analysis and Machine Intelligence(2025)

work page 2025