pith. sign in

arxiv: 2602.06343 · v2 · submitted 2026-02-06 · 💻 cs.CV

Uncertainty-Aware 4D Gaussian Splatting for Monocular Occluded Human Rendering

Pith reviewed 2026-05-16 07:32 UTC · model grok-4.3

classification 💻 cs.CV
keywords 4D Gaussian Splattingmonocular human renderingocclusion handlinguncertainty estimationprobabilistic deformationdynamic scene renderingconfidence-aware regularization
0
0 comments X

The pith

Modeling observation uncertainty in 4D Gaussian splatting enables robust rendering of occluded humans from monocular videos.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Rendering dynamic humans from a single video view breaks down when body parts are hidden. The paper treats the problem as finding the most probable 3D scene given inputs with varying levels of noise. It adds a probabilistic deformation network and a joint rasterization step that produces a per-pixel uncertainty map. This map automatically reduces the influence of bad observations during training and guides regularizations that keep geometry consistent across space and time where visual cues are missing. Experiments on standard occluded-motion datasets show improved fidelity and stability over prior methods.

Core claim

By reformulating monocular occluded human rendering as a maximum a posteriori estimation problem under heteroscedastic observation noise, U-4DGS integrates a Probabilistic Deformation Network and a Joint Rasterization pipeline. This architecture renders pixel-aligned uncertainty maps that act as an adaptive gradient modulator, automatically attenuating artifacts from unreliable observations. Confidence-Aware Regularizations then leverage the learned uncertainty to selectively propagate spatial-temporal validity and prevent geometric drift in regions lacking reliable visual cues.

What carries the argument

Pixel-aligned uncertainty maps produced by the Joint Rasterization pipeline, which modulate gradients adaptively and inform Confidence-Aware Regularizations that selectively enforce spatial-temporal consistency.

If this is right

  • Unreliable observations produce fewer artifacts because their gradients are automatically down-weighted.
  • Geometric drift is reduced in occluded regions through uncertainty-guided propagation of spatial-temporal validity.
  • Rendering quality and temporal stability improve on datasets containing natural occlusions.
  • The same uncertainty mechanism can be applied to other 4D Gaussian splatting tasks with incomplete observations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be adapted to static scene reconstruction where parts of the environment are temporarily hidden.
  • Single-camera capture pipelines for animation or VR might become more practical if uncertainty handling reduces the need for multiple synchronized views.
  • Similar per-pixel uncertainty outputs could be tested on non-human dynamic objects such as animals or vehicles in monocular video.

Load-bearing premise

The learned uncertainty maps correctly flag unreliable observations so the regularizations can stop drift in hidden areas without creating new artifacts or over-smoothing visible parts.

What would settle it

A side-by-side comparison of rendered outputs against ground-truth geometry in heavily occluded frames from the ZJU-MoCap dataset would show whether uncertainty modulation reduces errors relative to versions without the uncertainty maps.

Figures

Figures reproduced from arXiv: 2602.06343 by Feifei Shao, Jun Xiao, Lin Li, Long Chen, Weiquan Wang, Zhen Wang.

Figure 1
Figure 1. Figure 1: Performance, Fidelity, and Stability. (a) Our U-4DGS achieves the best trade-off between rendering quality and training [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The framework of U-4DGS. Left: The Probabilistic Deformation Network conditions Canonical Gaussians on time embedding 𝛾 (𝑡) and pose 𝜃𝑡 to predict geometric offsets (Δr, Δ𝜇, Δs) alongside per-primitive aleatoric uncertainty 𝜎. Middle: The deformed Gaussians are transformed via LBS and rendered through a Joint Rasterization pipeline, simultaneously producing a photometric image and a pixel-aligned Uncertain… view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative comparisons on novel view synthesis. Left: Results on the ZJU-MoCap dataset with synthetic occlusions. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative ablation study. Standard Human Rendering. To demonstrate the impact of oc￾clusion on conventional pipelines, we evaluate HumanNeRF [39], GaussianAvatar [8], and GauHuman [9]. These methods are trained directly on the occluded sequences without specific handling mech￾anisms. (2)Occlusion-Aware Approaches. We compare against rep￾resentative methods covering three mainstream technical paradigms: a… view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of the learned Uncertainty Map. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

High-fidelity rendering of dynamic humans from monocular videos typically degrades catastrophically under occlusions. Existing solutions incorporate external priors-either hallucinating missing content via generative models, which induces severe temporal flickering, or imposing rigid geometric heuristics that fail to capture diverse appearances. To this end, we reformulate the task as a Maximum A Posteriori estimation problem under heteroscedastic observation noise. In this paper, we propose U-4DGS, a framework integrating a Probabilistic Deformation Network and a Joint Rasterization pipeline. This architecture renders pixel-aligned uncertainty maps that act as an adaptive gradient modulator, automatically attenuating artifacts from unreliable observations. Furthermore, to prevent geometric drift in regions lacking reliable visual cues, we enforce Confidence-Aware Regularizations, which leverage the learned uncertainty to selectively propagate spatial-temporal validity. Extensive experiments on the ZJU-MoCap and OcMotion datasets demonstrate that U-4DGS achieves state-of-the-art rendering fidelity and robustness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that monocular occluded human rendering can be improved by reformulating the problem as MAP estimation under heteroscedastic noise. It introduces U-4DGS, which combines a Probabilistic Deformation Network with a Joint Rasterization pipeline to produce pixel-aligned uncertainty maps that modulate gradients during optimization, plus Confidence-Aware Regularizations that use these maps to propagate spatial-temporal validity and prevent drift in occluded regions. Experiments on ZJU-MoCap and OcMotion datasets are reported to achieve state-of-the-art rendering fidelity and robustness without external generative priors or rigid heuristics.

Significance. If the uncertainty maps reliably isolate occlusion effects and the regularizations selectively stabilize geometry without over-smoothing, the work would advance 4D Gaussian Splatting by providing a data-driven mechanism for handling unreliable observations in dynamic human reconstruction. This could reduce reliance on generative hallucination or hand-crafted constraints, improving temporal consistency in real-world monocular capture scenarios.

major comments (2)
  1. [Methods (Probabilistic Deformation Network and Joint Rasterization)] The MAP reformulation under heteroscedastic noise (abstract and methods) assumes the uncertainty head from Joint Rasterization produces maps that correctly down-weight occluded observations during photometric optimization. However, training uses only photometric losses plus the proposed regularizations with no explicit uncertainty supervision or occlusion masks mentioned; this risks the head converging to a trivial or correlated solution, undermining the adaptive gradient modulation and selective propagation claims.
  2. [Experiments and Results] The SOTA claims on ZJU-MoCap and OcMotion rest on reported fidelity and robustness improvements, yet the abstract and results provide no quantitative error bars, standard deviations across runs, or detailed ablation tables isolating the contribution of the uncertainty maps versus the regularizations. This makes it difficult to assess whether the gains are statistically significant or robust to the central assumption.
minor comments (2)
  1. [Methods] Notation for the uncertainty maps and the heteroscedastic noise model should be introduced with explicit equations early in the methods to clarify how the variance term enters the loss.
  2. [Figures] Figure captions for uncertainty visualizations should include quantitative metrics (e.g., correlation with ground-truth occlusion) rather than qualitative examples alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify key aspects of our MAP formulation and experimental validation. We address each major point below and have revised the manuscript to strengthen the presentation of our approach.

read point-by-point responses
  1. Referee: [Methods (Probabilistic Deformation Network and Joint Rasterization)] The MAP reformulation under heteroscedastic noise (abstract and methods) assumes the uncertainty head from Joint Rasterization produces maps that correctly down-weight occluded observations during photometric optimization. However, training uses only photometric losses plus the proposed regularizations with no explicit uncertainty supervision or occlusion masks mentioned; this risks the head converging to a trivial or correlated solution, undermining the adaptive gradient modulation and selective propagation claims.

    Authors: The uncertainty head is trained end-to-end as part of the heteroscedastic negative log-likelihood objective, where the per-pixel photometric loss is scaled inversely by the predicted uncertainty. This formulation, standard in probabilistic deep learning, naturally drives higher uncertainty predictions for pixels that cannot be explained well by the current model (e.g., occluded regions), without requiring explicit masks or supervision. The Confidence-Aware Regularizations further constrain the uncertainty field to be spatially and temporally coherent, mitigating the risk of trivial solutions such as uniform high uncertainty. We have added a derivation of the modulated gradient and additional uncertainty map visualizations in the revised methods and experiments sections to illustrate that the learned maps align with occlusion patterns. revision: partial

  2. Referee: [Experiments and Results] The SOTA claims on ZJU-MoCap and OcMotion rest on reported fidelity and robustness improvements, yet the abstract and results provide no quantitative error bars, standard deviations across runs, or detailed ablation tables isolating the contribution of the uncertainty maps versus the regularizations. This makes it difficult to assess whether the gains are statistically significant or robust to the central assumption.

    Authors: We agree that additional statistical reporting and component-wise ablations would strengthen the claims. In the revised manuscript we now report mean and standard deviation over three independent runs for all metrics in Tables 1 and 2. We have also expanded the ablation study (new Table 4 and supplementary figures) to isolate the uncertainty-modulated joint rasterization from the confidence-aware regularizations, confirming that both contribute measurably, with the uncertainty maps providing the largest gain on occluded sequences. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper reformulates monocular occluded human rendering as MAP estimation under heteroscedastic noise, then introduces a Probabilistic Deformation Network plus Joint Rasterization to output pixel-aligned uncertainty maps that modulate gradients, followed by Confidence-Aware Regularizations that use those maps. These are learned auxiliary outputs applied downstream rather than quantities defined in terms of the final rendering metric by construction. No equation reduces the claimed SOTA fidelity on ZJU-MoCap or OcMotion to a self-referential fit or self-citation chain; the central claims rest on empirical results and the architectural integration rather than tautological redefinitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard probabilistic modeling assumptions and the 4D Gaussian Splatting representation; no new physical entities are postulated and no free parameters are explicitly named in the abstract.

axioms (1)
  • domain assumption Observation noise in monocular human video is heteroscedastic and can be captured by a learned per-pixel uncertainty map.
    Invoked in the MAP estimation reformulation and used to modulate gradients and regularizations.

pith-pipeline@v0.9.0 · 5475 in / 1291 out tokens · 49568 ms · 2026-05-16T07:32:01.854953+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis

    cs.CV 2026-04 unverdicted novelty 8.0

    DF3DV-1K supplies 1,048 scenes with clean and cluttered image pairs plus a challenging 41-scene subset to benchmark and improve distractor-free radiance field methods.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Jeongmin Bae, Seoha Kim, Youngsik Yun, Hahyun Lee, Gun Bang, and Youngjung Uh. 2024. Per-gaussian embedding-based deformation for deformable 3d gaussian splatting. InEuropean Conference on Computer Vision. Springer, 321–335

  2. [2]

    Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015. High-quality streamable free-viewpoint video.ACM Transactions on Graphics (ToG)34, 4 (2015), 1–13

  3. [3]

    Jinlong Fan, Shanshan Zhao, Liang Zheng, Jing Zhang, Yuxiang Yang, and Ming- ming Gong. 2026. InpaintHuman: Reconstructing Occluded Humans with Multi- Scale UV Mapping and Identity-Preserving Diffusion Inpainting.arXiv preprint arXiv:2601.02098(2026)

  4. [4]

    Ben Fei, Jingyi Xu, Rui Zhang, Qingyuan Zhou, Weidong Yang, and Ying He. 2024. 3d gaussian splatting as new era: A survey.IEEE Transactions on Visualization and Computer Graphics(2024)

  5. [5]

    Guofeng Feng, Siyan Chen, Rong Fu, Zimu Liao, Yi Wang, Tao Liu, Boni Hu, Linning Xu, Zhilin Pei, Hengjie Li, et al . 2025. Flashgs: Efficient 3d gaussian splatting for large-scale and high-resolution rendering. InProceedings of the Computer Vision and Pattern Recognition Conference. 26652–26662

  6. [6]

    Lily Goli, Cody Reading, Silvia Sellán, Alec Jacobson, and Andrea Tagliasacchi

  7. [7]

    InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Bayes’ rays: Uncertainty quantification for neural radiance fields. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20061–20070

  8. [8]

    Fengzhi Guo, Chih-Chuan Hsu, Sihao Ding, and Cheng Zhang. 2025. Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction.arXiv preprint arXiv:2510.12768(2025)

  9. [9]

    Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang, Boyao Zhou, Boning Liu, Sheng- ping Zhang, and Liqiang Nie. 2024. Gaussianavatar: Towards realistic human avatar modeling from a single video via animatable 3d gaussians. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 634–644

  10. [10]

    Shoukang Hu, Tao Hu, and Ziwei Liu. 2024. Gauhuman: Articulated gaussian splatting from monocular human videos. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 20418–20431

  11. [11]

    Buzhen Huang, Yuan Shu, Jingyi Ju, and Yangang Wang. 2022. Occluded human body capture with self-supervised spatial-temporal motion prior.arXiv preprint arXiv:2207.05375(2022)

  12. [12]

    Zeng Huang, Yuanlu Xu, Christoph Lassner, Hao Li, and Tony Tung. 2020. Arch: Animatable reconstruction of clothed humans. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3093–3102

  13. [13]

    Zekai Jiang, Tong Duan, and Dongyu Zhang. 2025. SymGaussian: Occluded Human Rendering with Multi-scale Symmetry Feature from Monocular Video. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5

  14. [14]

    Alex Kendall and Yarin Gal. 2017. What uncertainties do we need in bayesian deep learning for computer vision?Advances in neural information processing systems30 (2017)

  15. [15]

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis

  16. [16]

    Graph.42, 4 (2023), 139–1

    3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph.42, 4 (2023), 139–1

  17. [17]

    Adam: A Method for Stochastic Optimization

    Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Opti- mization. In3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6980

  18. [18]

    Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, and Anurag Ranjan. 2024. Hugs: Human gaussian splats. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 505–515

  19. [19]

    Inhee Lee, Byungjun Kim, and Hanbyul Joo. 2024. Guess the unseen: Dynamic 3d scene reconstruction from partial 2d glimpses. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1062–1071

  20. [20]

    Sibaek Lee, Kyeongsu Kang, Seongbo Ha, and Hyeonwoo Yu. 2025. Bayesian NeRF: Quantifying uncertainty with volume density for neural implicit fields. IEEE Robotics and Automation Letters10, 3 (2025), 2144–2151

  21. [21]

    Jiahui Lei, Yufu Wang, Georgios Pavlakos, Lingjie Liu, and Kostas Daniilidis

  22. [22]

    InProceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Gart: Gaussian articulated template models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 19876–19887

  23. [23]

    Chen Li, Jiahao Lin, and Gim Hee Lee. 2024. Ghunerf: Generalizable human nerf from a monocular video. In2024 International Conference on 3D Vision (3DV). IEEE, 923–932

  24. [24]

    Deqi Li, Shi-Sheng Huang, Zhiyuan Lu, Xinran Duan, and Hua Huang. 2024. St-4dgs: Spatial-temporally consistent 4d gaussian splatting for efficient dynamic scene rendering. InACM SIGGRAPH 2024 Conference Papers. 1–11

  25. [25]

    Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. 2015. SMPL: a skinned multi-person linear model.ACM Trans- actions on Graphics (TOG)34, 6 (2015), 1–16

  26. [26]

    Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. 2024. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In2024 International Conference on 3D Vision (3DV). IEEE, 800–809

  27. [27]

    Ricardo Martin-Brualla, Noha Radwan, Mehdi SM Sajjadi, Jonathan T Barron, Alexey Dosovitskiy, and Daniel Duckworth. 2021. Nerf in the wild: Neural radiance fields for unconstrained photo collections. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7210–7219

  28. [28]

    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems32 (2019)

  29. [29]

    Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Xiaowei Zhou, and Hujun Bao. 2021. Animatable neural radiance fields for modeling dynamic human bodies. InProceedings of the IEEE/CVF international conference on computer vision. 14314–14323

  30. [30]

    Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. 2021. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9054–9063

  31. [31]

    Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, and Siyu Tang

  32. [32]

    In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5020–5030

  33. [33]

    Weining Ren, Zihan Zhu, Boyang Sun, Jiaqi Chen, Marc Pollefeys, and Songyou Peng. 2024. Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8931–8940

  34. [34]

    Sara Sabour, Suhani Vora, Daniel Duckworth, Ivan Krasin, David J Fleet, and Andrea Tagliasacchi. 2023. Robustnerf: Ignoring distractors with robust losses. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 20626–20636

  35. [35]

    Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, and Zeyu Wang. 2024. Splattingavatar: Realistic real-time human avatars with mesh-embedded gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1606–1616

  36. [36]

    Zhuo Su, Lan Xu, Zerong Zheng, Tao Yu, Yebin Liu, and Lu Fang. 2020. Robust- fusion: Human volumetric capture with data-driven visual cues using a rgbd camera. InEuropean Conference on Computer Vision. Springer, 246–264

  37. [37]

    Adam Sun, Tiange Xiang, Scott Delp, Li Fei-Fei, and Ehsan Adeli. 2024. Occfusion: Rendering occluded humans with generative diffusion priors.Advances in neural information processing systems37 (2024), 92184–92209

  38. [38]

    Niko Sünderhauf, Jad Abou-Chakra, and Dimity Miller. 2023. Density-aware NeRF Ensembles: Quantifying Predictive Uncertainty in Neural Radiance Fields. In2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 9370–9376

  39. [39]

    Gusi Te, Xiu Li, Xiao Li, Jinglu Wang, Wei Hu, and Yan Lu. 2022. Neural capture of animatable 3d human from monocular video. InEuropean Conference on Computer Vision. Springer, 275–291. xxxx, xx, xx Weiquan Wang, Feifei Shao, Lin Li, Zhen Wang, Jun Xiao, and Long Chen

  40. [40]

    Yating Tian, Hongwen Zhang, Yebin Liu, and Limin Wang. 2023. Recovering 3d human mesh from monocular images: A survey.IEEE transactions on pattern analysis and machine intelligence45, 12 (2023), 15406–15425

  41. [41]

    Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing13, 4 (2004), 600–612

  42. [42]

    Jing Wen, Xiaoming Zhao, Zhongzheng Ren, Alexander G Schwing, and Shenlong Wang. 2024. Gomavatar: Efficient animatable human modeling from monocular video using gaussians-on-mesh. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2059–2069

  43. [43]

    Chung-Yi Weng, Brian Curless, Pratul P Srinivasan, Jonathan T Barron, and Ira Kemelmacher-Shlizerman. 2022. Humannerf: Free-viewpoint rendering of moving people from monocular video. InProceedings of the IEEE/CVF conference on computer vision and pattern Recognition. 16210–16220

  44. [44]

    Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 2024. 4d gaussian splatting for real-time dynamic scene rendering. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 20310–20320

  45. [45]

    Tiange Xiang, Adam Sun, Scott Delp, Kazuki Kozuka, Li Fei-Fei, and Ehsan Adeli. 2025. Rendering Humans behind Occlusions.IEEE Transactions on Pattern Analysis and Machine Intelligence(2025)

  46. [46]

    Tiange Xiang, Adam Sun, Jiajun Wu, Ehsan Adeli, and Li Fei-Fei. 2023. Rendering humans from object-occluded monocular videos. InProceedings of the IEEE/CVF International Conference on Computer Vision. 3239–3250

  47. [47]

    Shuo Yang, Xiaoling Gu, Zhenzhong Kuang, Feiwei Qin, and Zizhao Wu. 2025. Innovative AI techniques for photorealistic 3D clothed human reconstruction from monocular images or videos: a survey.The Visual Computer41, 6 (2025), 3973–4000

  48. [48]

    Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. 2024. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 20331–20341

  49. [49]

    Jingrui Ye, Zhongkai Zhang, and Qingmin Liao. 2025. Occgaussian: 3d gaussian splatting for occluded human rendering. InProceedings of the 2025 International Conference on Multimedia Retrieval. 1710–1719

  50. [50]

    Tao Yu, Zerong Zheng, Kaiwen Guo, Pengpeng Liu, Qionghai Dai, and Yebin Liu. 2021. Function4d: Real-time human volumetric capture from very sparse consumer rgbd sensors. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5746–5756

  51. [51]

    Zhengming Yu, Wei Cheng, Xian Liu, Wayne Wu, and Kwan-Yee Lin. 2023. Mono- human: Animatable human neural field from monocular video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16943– 16953

  52. [52]

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang

  53. [53]

    InProceedings of the IEEE conference on computer vision and pattern recognition

    The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition. 586–595

  54. [54]

    Xinjie Zhang, Zhening Liu, Yifan Zhang, Xingtong Ge, Dailan He, Tongda Xu, Yan Wang, Zehong Lin, Shuicheng Yan, and Jun Zhang. 2025. Mega: Memory- efficient 4d gaussian splatting for dynamic scenes. InProceedings of the IEEE/CVF International Conference on Computer Vision. 27828–27838

  55. [55]

    Yiqun Zhao, Chenming Wu, Binbin Huang, Yihao Zhi, Chen Zhao, Jingdong Wang, and Shenghua Gao. 2025. Surfel-based Gaussian inverse rendering for fast and relightable dynamic human reconstruction from monocular videos.IEEE Transactions on Pattern Analysis and Machine Intelligence(2025)