Physically Plausible Human-Object Rendering from Sparse Views via 3D Gaussian Splatting

Jun Xiao; Long Chen; Weiquan Wang; Yi Yang; Yueting Zhuang

arxiv: 2503.09640 · v2 · submitted 2025-03-12 · 💻 cs.GR · cs.CV

Physically Plausible Human-Object Rendering from Sparse Views via 3D Gaussian Splatting

Weiquan Wang , Jun Xiao , Yi Yang , Yueting Zhuang , Long Chen This is my paper

Pith reviewed 2026-05-23 01:02 UTC · model grok-4.3

classification 💻 cs.GR cs.CV

keywords 3D Gaussian SplattingHuman-Object InteractionSparse View RenderingPhysical PlausibilityDynamic GaussiansPose RefinementContact PredictionGeometric Constraints

0 comments

The pith

HOGS renders physically plausible human-object interactions from sparse views by optimizing dynamic 3D Gaussians with contact constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces HOGS, a framework that represents humans and objects as dynamic 3D Gaussians and optimizes them directly to enforce geometric consistency. This prevents inter-penetration or floating contacts while producing high-quality renderings from limited camera inputs. Two supporting pre-trained modules refine human poses under occlusion and predict contact regions to guide the optimization losses. Experiments on interaction datasets show the method reaches state-of-the-art visual quality at high speed.

Core claim

HOGS represents both humans and objects as dynamic 3D Gaussians. A novel optimization process operates directly on these Gaussians to enforce geometric consistency, preventing inter-penetration or floating contacts, thereby achieving physical plausibility. Two pre-trained modules—an optimization-guided Human Pose Refiner and a Human-Object Contact Predictor—supply accurate pose and contact estimates to support the optimization under sparse-view ambiguity.

What carries the argument

Dynamic 3D Gaussians optimized via contact and separation losses, guided by a Human Pose Refiner and Human-Object Contact Predictor.

If this is right

Enables rendering of human-object and hand-object scenes from sparse views while maintaining physical plausibility.
Achieves state-of-the-art rendering quality alongside high computational efficiency.
The direct Gaussian optimization enforces no inter-penetration and proper contact without post-processing.
The framework supports both full-body and hand-scale interactions on existing datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same Gaussian representation and loss structure could apply to other dynamic scenes requiring geometric constraints, such as multi-object stacking.
If the contact predictor generalizes, the method may reduce reliance on dense views in real-world capture setups.
Efficiency gains suggest possible use in interactive applications where both realism and speed matter.
Failure modes in the refiner module would likely appear first under heavy occlusion or unusual poses.

Load-bearing premise

The pre-trained pose refiner and contact predictor modules produce sufficiently accurate estimates from sparse views to guide the losses without introducing new errors.

What would settle it

If rendered outputs on the test datasets exhibit interpenetrations or floating contacts where ground-truth interactions show touching, or if rendering metrics fall below prior methods.

Figures

Figures reproduced from arXiv: 2503.09640 by Jun Xiao, Long Chen, Weiquan Wang, Yi Yang, Yueting Zhuang.

**Figure 2.** Figure 2: HOGS pipeline. Given some sparse views of a dynamic HOI scene, HOGS first deforms human and object representations using a Human-Object Deformation process, which includes LBS for humans and rigid transformations for objects, along with a Human Pose Refinement module to enhance target pose accuracy. Deformed human and object Gaussians are then composed into a unified 3D space to form the Composed Gaussian … view at source ↗

**Figure 3.** Figure 3: Illustration of sparse-view human pose refinement [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: The workflow of sparse-view contact prediction [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative evaluation of novel view synthesis for HOI rendering on the HODome dataset. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Extensibility of HOGS [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Effect of physical loss. (a) Without physical loss, the rendered human floats above the chair, exhibiting a lack of physical contact. (b) Incorporating physical loss results in plausible contact and a more physically consistent rendering. diction module leads to a slight decrease in rendering quality but a significant increase in rendering efficiency (Table 2). This phenomenon arises from the focused sc… view at source ↗

read the original abstract

Rendering realistic human-object interactions (HOIs) from sparse-view inputs is a challenging yet crucial task for various real-world applications. Existing methods often struggle to simultaneously achieve high rendering quality, physical plausibility, and computational efficiency. To address these limitations, we propose HOGS (Human-Object Rendering via 3D Gaussian Splatting), a novel framework for efficient HOI rendering with physically plausible geometric constraints from sparse views. HOGS represents both humans and objects as dynamic 3D Gaussians. Central to HOGS is a novel optimization process that operates directly on these Gaussians to enforce geometric consistency (i.e., preventing inter-penetration or floating contacts) to achieve physical plausibility. To support this core optimization under sparse-view ambiguity, our framework incorporates two pre-trained modules: an optimization-guided Human Pose Refiner for robust estimation under sparse-view occlusions, and a Human-Object Contact Predictor that efficiently identifies interaction regions to guide our novel contact and separation losses. Extensive experiments on both human-object and hand-object interaction datasets demonstrate that HOGS achieves state-of-the-art rendering quality and maintains high computational efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HOGS optimizes dynamic 3D Gaussians with contact losses guided by two fixed pre-trained modules, but the physical plausibility claim rests on those modules working accurately under the sparse occlusions the method targets.

read the letter

The paper introduces HOGS, which represents humans and objects as dynamic 3D Gaussians and adds a direct optimization step on those Gaussians to enforce geometric consistency via contact and separation losses. Two pre-trained networks supply the pose refinement and contact regions that drive those losses. This specific integration for sparse-view human-object rendering is the main new element; earlier Gaussian splatting work did not combine these pieces in the same way for HOI scenes.

Referee Report

1 major / 0 minor

Summary. The manuscript presents HOGS, a framework for rendering human-object interactions (HOIs) from sparse views. It represents humans and objects as dynamic 3D Gaussians and performs optimization directly on these Gaussians to enforce physical plausibility via novel contact and separation losses that prevent inter-penetration and floating contacts. Two fixed pre-trained modules—an optimization-guided Human Pose Refiner and a Human-Object Contact Predictor—supply the contact regions and refined poses that guide the losses under sparse-view ambiguity. Experiments on human-object and hand-object interaction datasets are reported to achieve state-of-the-art rendering quality while maintaining computational efficiency.

Significance. If the pre-trained modules prove reliable under the targeted sparse-view occlusions, the approach could advance efficient, physically constrained rendering of interactions by combining dynamic 3D Gaussian Splatting with geometric losses. The direct optimization on Gaussians and the use of contact-aware terms address limitations in prior methods. The significance is limited, however, by the absence of independent validation for the modules that supply the physical constraints.

major comments (1)

[Sections 3.3 and 3.4] Sections 3.3 and 3.4: The contact and separation losses are defined directly on the outputs of the fixed pre-trained Human Pose Refiner and Human-Object Contact Predictor. No ablation studies isolate the accuracy of these modules (pose error, contact precision/recall) on sparse-view data against ground truth. Because the modules remain frozen during Gaussian optimization, errors they produce under occlusion would propagate into the physical-plausibility constraints with no independent recovery mechanism, undermining the central claim that the optimization enforces geometric consistency.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment point by point below.

read point-by-point responses

Referee: Sections 3.3 and 3.4: The contact and separation losses are defined directly on the outputs of the fixed pre-trained Human Pose Refiner and Human-Object Contact Predictor. No ablation studies isolate the accuracy of these modules (pose error, contact precision/recall) on sparse-view data against ground truth. Because the modules remain frozen during Gaussian optimization, errors they produce under occlusion would propagate into the physical-plausibility constraints with no independent recovery mechanism, undermining the central claim that the optimization enforces geometric consistency.

Authors: We acknowledge that the current manuscript does not include isolated ablation studies evaluating the pose error or contact precision/recall of the fixed pre-trained Human Pose Refiner and Human-Object Contact Predictor specifically on sparse-view inputs against ground truth. The modules are indeed held fixed during the Gaussian optimization, as stated in Sections 3.3 and 3.4, so any inaccuracies under heavy occlusion would directly influence the contact and separation losses. Our defense of the central claim rests on the end-to-end experimental results: HOGS achieves state-of-the-art rendering quality and physical-plausibility metrics on both human-object and hand-object datasets, outperforming baselines that lack these geometric constraints. This indicates that the overall optimization produces plausible outputs in practice. To strengthen the presentation, we will add the requested module-level ablations (pose error and contact metrics on sparse-view test data) to the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: framework uses external pre-trained modules and novel losses

full rationale

The paper presents HOGS as a forward proposal that represents humans and objects as dynamic 3D Gaussians and introduces a new optimization process with contact and separation losses. These losses are guided by two explicitly pre-trained modules (Human Pose Refiner and Human-Object Contact Predictor) described as fixed inputs. No derivation, equation, or claim in the provided text reduces a performance quantity to a fitted parameter from the same data, renames a known result, or relies on a load-bearing self-citation chain. The method is therefore self-contained against external benchmarks and the central rendering claims do not collapse by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; all technical details remain opaque.

pith-pipeline@v0.9.0 · 5737 in / 1186 out tokens · 47308 ms · 2026-05-23T01:02:52.872122+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Rendering Multi-Human and Multi-Object with 3D Gaussian Splatting
cs.CV 2026-04 unverdicted novelty 5.0

MM-GS combines per-instance multi-view fusion with scene-level interaction modeling on 3D Gaussians to render high-fidelity multi-human multi-object scenes from sparse views.

Reference graph

Works this paper leans on

83 extracted references · 83 canonical work pages · cited by 1 Pith paper

[1]

Differentiable render- ing of neural sdfs through reparameterization

Sai Praveen Bangaru, Michael Gharbi, Fujun Luan, Tzu-Mao Li, Kalyan Sunkavalli, Milos Hasan, Sai Bi, Zexiang Xu, Gilbert Bernstein, and Fredo Durand. Differentiable render- ing of neural sdfs through reparameterization. InSIGGRAPH Asia 2022 Conference Papers, pages 1–9, 2022. 3

work page 2022
[2]

4d visualization of dynamic events from unconstrained multi-view videos

Aayush Bansal, Minh V o, Yaser Sheikh, Deva Ramanan, and Srinivasa Narasimhan. 4d visualization of dynamic events from unconstrained multi-view videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5366–5375, 2020. 2

work page 2020
[3]

Interaction networks for learning about objects, relations and physics

Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, et al. Interaction networks for learning about objects, relations and physics. Advances in neural in- formation processing systems, 29, 2016. 3

work page 2016
[4]

Method for registration of 3-d shapes

Paul J Besl and Neil D McKay. Method for registration of 3-d shapes. In Sensor fusion IV: control paradigms and data structures, pages 586–606. Spie, 1992. 5

work page 1992
[5]

Behave: Dataset and method for tracking human object in- teractions

Bharat Lal Bhatnagar, Xianghui Xie, Ilya A Petrov, Cristian Sminchisescu, Christian Theobalt, and Gerard Pons-Moll. Behave: Dataset and method for tracking human object in- teractions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15935– 15946, 2022. 2

work page 2022
[6]

Keep it smpl: Automatic estimation of 3d human pose and shape from a single image

Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J Black. Keep it smpl: Automatic estimation of 3d human pose and shape from a single image. In Computer Vision–ECCV 2016: 14th Euro- pean Conference, Amsterdam, The Netherlands, October 11- 14, 2016, Proceedings, Part V 14, pages 561–578. Springer,

work page 2016
[7]

Flashback: Immersive virtual reality on mobile devices via rendering memoization

Kevin Boos, David Chu, and Eduardo Cuervo. Flashback: Immersive virtual reality on mobile devices via rendering memoization. In Proceedings of the 14th Annual Interna- tional Conference on Mobile Systems, Applications, and Ser- vices, pages 291–304, 2016. 2

work page 2016
[8]

Hexplane: A fast representa- tion for dynamic scenes

Ang Cao and Justin Johnson. Hexplane: A fast representa- tion for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 130–141, 2023. 3

work page 2023
[9]

Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images

Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. In European Conference on Computer Vision, pages 370–386. Springer, 2025. 2

work page 2025
[10]

High-quality streamable free-viewpoint video

Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Den- nis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. High-quality streamable free-viewpoint video. ACM Transactions on Graphics (ToG) , 34(4):1–13,

work page
[11]

point diffu- sion implicit function for large-scale scene neural represen- tation

Yuhan Ding, Fukun Yin, Jiayuan Fan, Hui Li, Xin Chen, Wen Liu, Chongshan Lu, Gang Yu, and Tao Chen. point diffu- sion implicit function for large-scale scene neural represen- tation. Advances in Neural Information Processing Systems, 36, 2024. 3

work page 2024
[12]

Motion2fusion: Real-time volumetric performance capture

Mingsong Dou, Philip Davidson, Sean Ryan Fanello, Sameh Khamis, Adarsh Kowdle, Christoph Rhemann, Vladimir Tankovich, and Shahram Izadi. Motion2fusion: Real-time volumetric performance capture. ACM Transactions on Graphics (ToG), 36(6):1–16, 2017. 2

work page 2017
[13]

3d gaussian splatting as new era: A survey

Ben Fei, Jingyi Xu, Rui Zhang, Qingyuan Zhou, Weidong Yang, and Ying He. 3d gaussian splatting as new era: A survey. IEEE Transactions on Visualization and Computer Graphics, 2024. 2

work page 2024
[14]

Associated reality: A cognitive human–machine layer for autonomous driving

Felipe Fernandez, Angel Sanchez, Jose F Velez, and Belen Moreno. Associated reality: A cognitive human–machine layer for autonomous driving. Robotics and Autonomous Systems, 133:103624, 2020. 1

work page 2020
[15]

Miruoto: Sports event atmo- sphere visual rendering through real-time image and sound processing system

Guillaume Gourmelen, Shutaro Toriya, Eiko Miya, Naohisa Shioura, and Hiroyasu Iwata. Miruoto: Sports event atmo- sphere visual rendering through real-time image and sound processing system. In ACM SIGGRAPH 2024 Emerging Technologies, pages 1–2. 2024. 1

work page 2024
[16]

Observing human-object interactions: Using spatial and functional compatibility for recognition

Abhinav Gupta, Aniruddha Kembhavi, and Larry S Davis. Observing human-object interactions: Using spatial and functional compatibility for recognition. IEEE transactions on pattern analysis and machine intelligence , 31(10):1775– 1789, 2009. 3

work page 2009
[17]

Resolving 3d human pose ambiguities with 3d scene constraints

Mohamed Hassan, Vasileios Choutas, Dimitrios Tzionas, and Michael J Black. Resolving 3d human pose ambiguities with 3d scene constraints. In Proceedings of the IEEE/CVF international conference on computer vision , pages 2282– 2292, 2019. 3

work page 2019
[18]

Populating 3d scenes by learning human-scene interaction

Mohamed Hassan, Partha Ghosh, Joachim Tesch, Dim- itrios Tzionas, and Michael J Black. Populating 3d scenes by learning human-scene interaction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14708–14718, 2021. 3

work page 2021
[19]

Hand-object interaction controller (hoic): Deep reinforce- ment learning for reconstructing interactions with physics

Haoyu Hu, Xinyu Yi, Zhe Cao, Jun-Hai Yong, and Feng Xu. Hand-object interaction controller (hoic): Deep reinforce- ment learning for reconstructing interactions with physics. In ACM SIGGRAPH 2024 Conference Papers , pages 1–10,

work page 2024
[20]

Gauhuman: Articu- lated gaussian splatting from monocular human videos

Shoukang Hu, Tao Hu, and Ziwei Liu. Gauhuman: Articu- lated gaussian splatting from monocular human videos. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 20418–20431, 2024. 2, 3, 4

work page 2024
[21]

Capturing and inferring dense full-body human-scene contact

Chun-Hao P Huang, Hongwei Yi, Markus H ¨oschle, Matvey Safroshkin, Tsvetelina Alexiadis, Senya Polikovsky, Daniel Scharstein, and Michael J Black. Capturing and inferring dense full-body human-scene contact. In Proceedings of 9 the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13274–13285, 2022. 1

work page 2022
[22]

Arch: Animatable reconstruction of clothed hu- mans

Zeng Huang, Yuanlu Xu, Christoph Lassner, Hao Li, and Tony Tung. Arch: Animatable reconstruction of clothed hu- mans. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition , pages 3093–3102,

work page
[23]

Interactive synthesis of human- object interaction

Sumit Jain and C Karen Liu. Interactive synthesis of human- object interaction. In Proceedings of the 2009 ACM SIG- GRAPH/Eurographics Symposium on Computer Animation, pages 47–53, 2009. 3

work page 2009
[24]

Flexnerf: Photorealistic free- viewpoint rendering of moving humans from sparse views

Vinoj Jayasundara, Amit Agrawal, Nicolas Heron, Abhinav Shrivastava, and Larry S Davis. Flexnerf: Photorealistic free- viewpoint rendering of moving humans from sparse views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21118–21127, 2023. 2

work page 2023
[25]

Neuralhofu- sion: Neural volumetric rendering under human-object in- teractions

Yuheng Jiang, Suyi Jiang, Guoxing Sun, Zhuo Su, Kai- wen Guo, Minye Wu, Jingyi Yu, and Lan Xu. Neuralhofu- sion: Neural volumetric rendering under human-object in- teractions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 6155– 6165, 2022. 2

work page 2022
[26]

Instant-nvr: Instant neural volumetric ren- dering for human-object interactions from monocular rgbd stream

Yuheng Jiang, Kaixin Yao, Zhuo Su, Zhehao Shen, Haimin Luo, and Lan Xu. Instant-nvr: Instant neural volumetric ren- dering for human-object interactions from monocular rgbd stream. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 595–605,

work page
[27]

End-to-end recovery of human shape and pose

Angjoo Kanazawa, Michael J Black, David W Jacobs, and Jitendra Malik. End-to-end recovery of human shape and pose. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 7122–7131, 2018. 4, 14

work page 2018
[28]

3d gaussian splatting for real-time radiance field rendering

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph., 42(4):139–1,

work page
[29]

Hugs: Human gaussian splats

Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, and Anurag Ranjan. Hugs: Human gaussian splats. In Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition , pages 505–515, 2024. 2, 3, 4

work page 2024
[30]

Human action recognition and predic- tion: A survey

Yu Kong and Yun Fu. Human action recognition and predic- tion: A survey. International Journal of Computer Vision , 130(5):1366–1401, 2022. 1

work page 2022
[31]

Gen- eralizable human gaussians for sparse view synthesis

Youngjoong Kwon, Baole Fang, Yixing Lu, Haoye Dong, Cheng Zhang, Francisco Vicente Carrasco, Albert Mosella- Montoro, Jianjin Xu, Shingo Takagi, Daeil Kim, et al. Gen- eralizable human gaussians for sparse view synthesis. In European Conference on Computer Vision, pages 451–468. Springer, 2025. 3

work page 2025
[32]

Gp- nerf: Generalized perception nerf for context-aware 3d scene understanding

Hao Li, Dingwen Zhang, Yalun Dai, Nian Liu, Lechao Cheng, Jingfeng Li, Jingdong Wang, and Junwei Han. Gp- nerf: Generalized perception nerf for context-aware 3d scene understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21708– 21718, 2024. 3

work page 2024
[33]

Task-oriented human-object interactions generation with implicit neural representations

Quanzhou Li, Jingbo Wang, Chen Change Loy, and Bo Dai. Task-oriented human-object interactions generation with implicit neural representations. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3035–3044, 2024. 1

work page 2024
[34]

Para- metric model-based 3d human shape and pose estimation from multiple views

Zhongguo Li, Anders Heyden, and Magnus Oskarsson. Para- metric model-based 3d human shape and pose estimation from multiple views. In Image Analysis: 21st Scandinavian Conference, SCIA 2019, Norrk ¨oping, Sweden, June 11–13, 2019, Proceedings 21, pages 336–347. Springer, 2019. 4

work page 2019
[35]

Learning implicit templates for point-based clothed human modeling

Siyou Lin, Hongwen Zhang, Zerong Zheng, Ruizhi Shao, and Yebin Liu. Learning implicit templates for point-based clothed human modeling. In European Conference on Com- puter Vision, pages 210–228. Springer, 2022. 4

work page 2022
[36]

Hosnerf: Dynamic human-object-scene neural ra- diance fields from a single video

Jia-Wei Liu, Yan-Pei Cao, Tianyuan Yang, Zhongcong Xu, Jussi Keppo, Ying Shan, Xiaohu Qie, and Mike Zheng Shou. Hosnerf: Dynamic human-object-scene neural ra- diance fields from a single video. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 18483–18494, 2023. 2, 4

work page 2023
[37]

Humangaus- sian: Text-driven 3d human generation with gaussian splat- ting

Xian Liu, Xiaohang Zhan, Jiaxiang Tang, Ying Shan, Gang Zeng, Dahua Lin, Xihui Liu, and Ziwei Liu. Humangaus- sian: Text-driven 3d human generation with gaussian splat- ting. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition , pages 6646–6657,

work page
[38]

Revisit human-scene interaction via space occu- pancy

Xinpeng Liu, Haowen Hou, Yanchao Yang, Yong-Lu Li, and Cewu Lu. Revisit human-scene interaction via space occu- pancy. In European Conference on Computer Vision, pages 1–19. Springer, 2025. 1

work page 2025
[39]

Neural rays for occlusion-aware image-based render- ing

Yuan Liu, Sida Peng, Lingjie Liu, Qianqian Wang, Peng Wang, Christian Theobalt, Xiaowei Zhou, and Wenping Wang. Neural rays for occlusion-aware image-based render- ing. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition , pages 7824–7833,

work page
[40]

Citygaussian: Real-time high-quality large-scale scene rendering with gaussians

Yang Liu, Chuanchen Luo, Lue Fan, Naiyan Wang, Jun- ran Peng, and Zhaoxiang Zhang. Citygaussian: Real-time high-quality large-scale scene rendering with gaussians. In European Conference on Computer Vision, pages 265–282. Springer, 2025. 3

work page 2025
[41]

Smpl: a skinned multi- person linear model

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. Smpl: a skinned multi- person linear model. ACM Transactions on Graphics (TOG), 34(6):1–16, 2015. 13

work page 2015
[42]

Splatfields: Neural gaussian splats for sparse 3d and 4d re- construction

Marko Mihajlovic, Sergey Prokudin, Siyu Tang, Robert Maier, Federica Bogo, Tony Tung, and Edmond Boyer. Splatfields: Neural gaussian splats for sparse 3d and 4d re- construction. In European Conference on Computer Vision, pages 313–332. Springer, 2025. 2

work page 2025
[43]

Nerf: Representing scenes as neural radiance fields for view syn- thesis

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. Communications of the ACM , 65(1):99–106, 2021. 3

work page 2021
[44]

Human gaussian 10 splatting: Real-time rendering of animatable avatars

Arthur Moreau, Jifei Song, Helisa Dhamo, Richard Shaw, Yiren Zhou, and Eduardo P ´erez-Pellitero. Human gaussian 10 splatting: Real-time rendering of animatable avatars. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 788–798, 2024. 3, 4

work page 2024
[45]

Instant neural graphics primitives with a mul- tiresolution hash encoding

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding. ACM transactions on graphics (TOG), 41(4):1–15, 2022. 3

work page 2022
[46]

Coherentgs: Sparse novel view synthesis with coherent 3d gaussians

Avinash Paliwal, Wei Ye, Jinhui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, and Nima Khademi Kalan- tari. Coherentgs: Sparse novel view synthesis with coherent 3d gaussians. In European Conference on Computer Vision, pages 19–37. Springer, 2025. 2

work page 2025
[47]

Sparse multi-view hand-object reconstruction for unseen environ- ments

Yik Lung Pang, Changjae Oh, and Andrea Cavallaro. Sparse multi-view hand-object reconstruction for unseen environ- ments. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 803–810, 2024. 1

work page 2024
[48]

Ani- matable neural radiance fields for modeling dynamic human bodies

Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Xiaowei Zhou, and Hujun Bao. Ani- matable neural radiance fields for modeling dynamic human bodies. In Proceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 14314–14323, 2021. 4, 14

work page 2021
[49]

Gendr: A generalized differentiable ren- derer

Felix Petersen, Bastian Goldluecke, Christian Borgelt, and Oliver Deussen. Gendr: A generalized differentiable ren- derer. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition , pages 4002–4011,

work page
[50]

Manus: Markerless grasp capture using articulated 3d gaussians

Chandradeep Pokhariya, Ishaan Nikhil Shah, Angela Xing, Zekun Li, Kefan Chen, Avinash Sharma, and Srinath Srid- har. Manus: Markerless grasp capture using articulated 3d gaussians. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 2197– 2208, 2024. 2, 6, 7, 14

work page 2024
[51]

3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting

Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, and Siyu Tang. 3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5020–5030, 2024. 3

work page 2024
[52]

Web ar: A promising future for mobile augmented reality—state of the art, chal- lenges, and insights

Xiuquan Qiao, Pei Ren, Schahram Dustdar, Ling Liu, Huadong Ma, and Junliang Chen. Web ar: A promising future for mobile augmented reality—state of the art, chal- lenges, and insights. Proceedings of the IEEE, 107(4):651– 666, 2019. 2

work page 2019
[53]

Em- bodied hands: modeling and capturing hands and bodies to- gether

Javier Romero, Dimitrios Tzionas, and Michael J Black. Em- bodied hands: modeling and capturing hands and bodies to- gether. ACM Transactions on Graphics (TOG), 36(6):1–17,

work page
[54]

Em- bodied hands: Modeling and capturing hands and bodies to- gether

Javier Romero, Dimitrios Tzionas, and Michael J Black. Em- bodied hands: Modeling and capturing hands and bodies to- gether. arXiv preprint arXiv:2201.02610, 2022. 4

work page arXiv 2022
[55]

Image quality assessment through fsim, ssim, mse and psnr—a comparative study

Umme Sara, Morium Akter, and Mohammad Shorif Ud- din. Image quality assessment through fsim, ssim, mse and psnr—a comparative study. Journal of Computer and Com- munications, 7(3):8–18, 2019. 14

work page 2019
[56]

Structure- from-motion revisited

Johannes L Schonberger and Jan-Michael Frahm. Structure- from-motion revisited. In Proceedings of the IEEE con- ference on computer vision and pattern recognition , pages 4104–4113, 2016. 2

work page 2016
[57]

Swings: sliding windows for dynamic 3d gaussian splatting

Richard Shaw, Michal Nazarczuk, Jifei Song, Arthur Moreau, Sibi Catley-Chandar, Helisa Dhamo, and Eduardo P´erez-Pellitero. Swings: sliding windows for dynamic 3d gaussian splatting. In European Conference on Computer Vision, pages 37–54. Springer, 2025. 2

work page 2025
[58]

Holo- ported characters: Real-time free-viewpoint rendering of humans from sparse rgb cameras

Ashwath Shetty, Marc Habermann, Guoxing Sun, Diogo Lu- vizon, Vladislav Golyanik, and Christian Theobalt. Holo- ported characters: Real-time free-viewpoint rendering of humans from sparse rgb cameras. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1206–1215, 2024. 2

work page 2024
[59]

Review of image-based rendering techniques

Harry Shum and Sing Bing Kang. Review of image-based rendering techniques. Visual Communications and Image Processing 2000, 4067:2–13, 2000. 2

work page 2000
[60]

Free viewpoint video extraction, representation, coding, and rendering

Aljoscha Smolic, Karsten Mueller, Philipp Merkle, Tobias Rein, Matthias Kautzner, Peter Eisert, and Thomas Wiegand. Free viewpoint video extraction, representation, coding, and rendering. In 2004 International Conference on Image Pro- cessing, 2004. ICIP’04., pages 3287–3290. IEEE, 2004. 2

work page 2004
[61]

Npc: Neural point characters from video

Shih-Yang Su, Timur Bagautdinov, and Helge Rhodin. Npc: Neural point characters from video. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 14795–14805, 2023. 2

work page 2023
[62]

Neural free-viewpoint performance rendering under complex human-object interactions

Guoxing Sun, Xin Chen, Yizhang Chen, Anqi Pang, Pei Lin, Yuheng Jiang, Lan Xu, Jingyi Yu, and Jingya Wang. Neural free-viewpoint performance rendering under complex human-object interactions. In Proceedings of the 29th ACM International Conference on Multimedia, pages 4651–4660,

work page
[63]

Neuralhumanfvv: Real-time neural volumetric human performance rendering using rgb cameras

Xin Suo, Yuheng Jiang, Pei Lin, Yingliang Zhang, Minye Wu, Kaiwen Guo, and Lan Xu. Neuralhumanfvv: Real-time neural volumetric human performance rendering using rgb cameras. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6226–6237,

work page
[64]

Grab: A dataset of whole-body human grasp- ing of objects

Omid Taheri, Nima Ghorbani, Michael J Black, and Dim- itrios Tzionas. Grab: A dataset of whole-body human grasp- ing of objects. In Computer Vision–ECCV 2020: 16th Eu- ropean Conference, Glasgow, UK, August 23–28, 2020, Pro- ceedings, Part IV 16, pages 581–600. Springer, 2020. 5

work page 2020
[65]

Neurad: Neural rendering for autonomous driving

Adam Tonderski, Carl Lindstr ¨om, Georg Hess, William Ljungbergh, Lennart Svensson, and Christoffer Petersson. Neurad: Neural rendering for autonomous driving. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14895–14904, 2024. 1

work page 2024
[66]

Deco: Dense estimation of 3d human-scene contact in the wild

Shashank Tripathi, Agniv Chatterjee, Jean-Claude Passy, Hongwei Yi, Dimitrios Tzionas, and Michael J Black. Deco: Dense estimation of 3d human-scene contact in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8001–8013, 2023. 5

work page 2023
[67]

Ibr- net: Learning multi-view image-based rendering

Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul P Srinivasan, Howard Zhou, Jonathan T Barron, Ricardo Martin-Brualla, Noah Snavely, and Thomas Funkhouser. Ibr- net: Learning multi-view image-based rendering. In Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4690–4699, 2021. 7, 14 11

work page 2021
[68]

Image quality assessment: from error visibility to structural similarity

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004. 6, 14

work page 2004
[69]

Hu- mannerf: Free-viewpoint rendering of moving people from monocular video

Chung-Yi Weng, Brian Curless, Pratul P Srinivasan, Jonathan T Barron, and Ira Kemelmacher-Shlizerman. Hu- mannerf: Free-viewpoint rendering of moving people from monocular video. In Proceedings of the IEEE/CVF con- ference on computer vision and pattern Recognition , pages 16210–16220, 2022. 2

work page 2022
[70]

Differentiable render- ing of parametric geometry

Markus Worchel and Marc Alexa. Differentiable render- ing of parametric geometry. ACM Transactions on Graphics (TOG), 42(6):1–18, 2023. 3

work page 2023
[71]

Object- compositional neural implicit surfaces

Qianyi Wu, Xian Liu, Yuedong Chen, Kejie Li, Chuanxia Zheng, Jianfei Cai, and Jianmin Zheng. Object- compositional neural implicit surfaces. In European Con- ference on Computer Vision, pages 197–213. Springer, 2022. 3

work page 2022
[72]

Space-time neural irradiance fields for free-viewpoint video

Wenqi Xian, Jia-Bin Huang, Johannes Kopf, and Changil Kim. Space-time neural irradiance fields for free-viewpoint video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9421–9431,

work page
[73]

Relightable and animatable neural avatar from sparse-view video

Zhen Xu, Sida Peng, Chen Geng, Linzhan Mou, Zihan Yan, Jiaming Sun, Hujun Bao, and Xiaowei Zhou. Relightable and animatable neural avatar from sparse-view video. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 990–1000, 2024. 3

work page 2024
[74]

Neural render- ing in a room: amodal 3d understanding and free-viewpoint rendering for the closed scene composed of pre-captured ob- jects

Bangbang Yang, Yinda Zhang, Yijin Li, Zhaopeng Cui, Sean Fanello, Hujun Bao, and Guofeng Zhang. Neural render- ing in a room: amodal 3d understanding and free-viewpoint rendering for the closed scene composed of pre-captured ob- jects. ACM Transactions on Graphics (TOG) , 41(4):1–10,

work page
[75]

Neural- dome: A neural modeling pipeline on multi-view human- object interactions

Juze Zhang, Haimin Luo, Hongdi Yang, Xinru Xu, Qianyang Wu, Ye Shi, Jingyi Yu, Lan Xu, and Jingya Wang. Neural- dome: A neural modeling pipeline on multi-view human- object interactions. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 8834–8845, 2023. 2, 5, 6, 7, 14

work page 2023
[76]

Hoi-mˆ 3: Capture multiple humans and objects in- teraction within contextual environment

Juze Zhang, Jingyan Zhang, Zining Song, Zhanhe Shi, Chengfeng Zhao, Ye Shi, Jingyi Yu, Lan Xu, and Jingya Wang. Hoi-mˆ 3: Capture multiple humans and objects in- teraction within contextual environment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 516–526, 2024. 3

work page 2024
[77]

Cor-gs: sparse-view 3d gaussian splatting via co-regularization

Jiawei Zhang, Jiahe Li, Xiaohan Yu, Lei Huang, Lin Gu, Jin Zheng, and Xiao Bai. Cor-gs: sparse-view 3d gaussian splatting via co-regularization. In European Conference on Computer Vision, pages 335–352. Springer, 2025. 2, 3

work page 2025
[78]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 6, 14

work page 2018
[79]

I’m hoi: Inertia-aware monocular capture of 3d human-object interac- tions

Chengfeng Zhao, Juze Zhang, Jiashen Du, Ziwei Shan, Junye Wang, Jingyi Yu, Jingya Wang, and Lan Xu. I’m hoi: Inertia-aware monocular capture of 3d human-object interac- tions. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 729–741, 2024. 1

work page 2024
[80]

In-place scene labelling and understanding with implicit scene representation

Shuaifeng Zhi, Tristan Laidlow, Stefan Leutenegger, and An- drew J Davison. In-place scene labelling and understanding with implicit scene representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 15838–15847, 2021. 3

work page 2021

Showing first 80 references.

[1] [1]

Differentiable render- ing of neural sdfs through reparameterization

Sai Praveen Bangaru, Michael Gharbi, Fujun Luan, Tzu-Mao Li, Kalyan Sunkavalli, Milos Hasan, Sai Bi, Zexiang Xu, Gilbert Bernstein, and Fredo Durand. Differentiable render- ing of neural sdfs through reparameterization. InSIGGRAPH Asia 2022 Conference Papers, pages 1–9, 2022. 3

work page 2022

[2] [2]

4d visualization of dynamic events from unconstrained multi-view videos

Aayush Bansal, Minh V o, Yaser Sheikh, Deva Ramanan, and Srinivasa Narasimhan. 4d visualization of dynamic events from unconstrained multi-view videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5366–5375, 2020. 2

work page 2020

[3] [3]

Interaction networks for learning about objects, relations and physics

Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, et al. Interaction networks for learning about objects, relations and physics. Advances in neural in- formation processing systems, 29, 2016. 3

work page 2016

[4] [4]

Method for registration of 3-d shapes

Paul J Besl and Neil D McKay. Method for registration of 3-d shapes. In Sensor fusion IV: control paradigms and data structures, pages 586–606. Spie, 1992. 5

work page 1992

[5] [5]

Behave: Dataset and method for tracking human object in- teractions

Bharat Lal Bhatnagar, Xianghui Xie, Ilya A Petrov, Cristian Sminchisescu, Christian Theobalt, and Gerard Pons-Moll. Behave: Dataset and method for tracking human object in- teractions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15935– 15946, 2022. 2

work page 2022

[6] [6]

Keep it smpl: Automatic estimation of 3d human pose and shape from a single image

Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J Black. Keep it smpl: Automatic estimation of 3d human pose and shape from a single image. In Computer Vision–ECCV 2016: 14th Euro- pean Conference, Amsterdam, The Netherlands, October 11- 14, 2016, Proceedings, Part V 14, pages 561–578. Springer,

work page 2016

[7] [7]

Flashback: Immersive virtual reality on mobile devices via rendering memoization

Kevin Boos, David Chu, and Eduardo Cuervo. Flashback: Immersive virtual reality on mobile devices via rendering memoization. In Proceedings of the 14th Annual Interna- tional Conference on Mobile Systems, Applications, and Ser- vices, pages 291–304, 2016. 2

work page 2016

[8] [8]

Hexplane: A fast representa- tion for dynamic scenes

Ang Cao and Justin Johnson. Hexplane: A fast representa- tion for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 130–141, 2023. 3

work page 2023

[9] [9]

Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images

Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. In European Conference on Computer Vision, pages 370–386. Springer, 2025. 2

work page 2025

[10] [10]

High-quality streamable free-viewpoint video

Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Den- nis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. High-quality streamable free-viewpoint video. ACM Transactions on Graphics (ToG) , 34(4):1–13,

work page

[11] [11]

point diffu- sion implicit function for large-scale scene neural represen- tation

Yuhan Ding, Fukun Yin, Jiayuan Fan, Hui Li, Xin Chen, Wen Liu, Chongshan Lu, Gang Yu, and Tao Chen. point diffu- sion implicit function for large-scale scene neural represen- tation. Advances in Neural Information Processing Systems, 36, 2024. 3

work page 2024

[12] [12]

Motion2fusion: Real-time volumetric performance capture

Mingsong Dou, Philip Davidson, Sean Ryan Fanello, Sameh Khamis, Adarsh Kowdle, Christoph Rhemann, Vladimir Tankovich, and Shahram Izadi. Motion2fusion: Real-time volumetric performance capture. ACM Transactions on Graphics (ToG), 36(6):1–16, 2017. 2

work page 2017

[13] [13]

3d gaussian splatting as new era: A survey

Ben Fei, Jingyi Xu, Rui Zhang, Qingyuan Zhou, Weidong Yang, and Ying He. 3d gaussian splatting as new era: A survey. IEEE Transactions on Visualization and Computer Graphics, 2024. 2

work page 2024

[14] [14]

Associated reality: A cognitive human–machine layer for autonomous driving

Felipe Fernandez, Angel Sanchez, Jose F Velez, and Belen Moreno. Associated reality: A cognitive human–machine layer for autonomous driving. Robotics and Autonomous Systems, 133:103624, 2020. 1

work page 2020

[15] [15]

Miruoto: Sports event atmo- sphere visual rendering through real-time image and sound processing system

Guillaume Gourmelen, Shutaro Toriya, Eiko Miya, Naohisa Shioura, and Hiroyasu Iwata. Miruoto: Sports event atmo- sphere visual rendering through real-time image and sound processing system. In ACM SIGGRAPH 2024 Emerging Technologies, pages 1–2. 2024. 1

work page 2024

[16] [16]

Observing human-object interactions: Using spatial and functional compatibility for recognition

Abhinav Gupta, Aniruddha Kembhavi, and Larry S Davis. Observing human-object interactions: Using spatial and functional compatibility for recognition. IEEE transactions on pattern analysis and machine intelligence , 31(10):1775– 1789, 2009. 3

work page 2009

[17] [17]

Resolving 3d human pose ambiguities with 3d scene constraints

Mohamed Hassan, Vasileios Choutas, Dimitrios Tzionas, and Michael J Black. Resolving 3d human pose ambiguities with 3d scene constraints. In Proceedings of the IEEE/CVF international conference on computer vision , pages 2282– 2292, 2019. 3

work page 2019

[18] [18]

Populating 3d scenes by learning human-scene interaction

Mohamed Hassan, Partha Ghosh, Joachim Tesch, Dim- itrios Tzionas, and Michael J Black. Populating 3d scenes by learning human-scene interaction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14708–14718, 2021. 3

work page 2021

[19] [19]

Hand-object interaction controller (hoic): Deep reinforce- ment learning for reconstructing interactions with physics

Haoyu Hu, Xinyu Yi, Zhe Cao, Jun-Hai Yong, and Feng Xu. Hand-object interaction controller (hoic): Deep reinforce- ment learning for reconstructing interactions with physics. In ACM SIGGRAPH 2024 Conference Papers , pages 1–10,

work page 2024

[20] [20]

Gauhuman: Articu- lated gaussian splatting from monocular human videos

Shoukang Hu, Tao Hu, and Ziwei Liu. Gauhuman: Articu- lated gaussian splatting from monocular human videos. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 20418–20431, 2024. 2, 3, 4

work page 2024

[21] [21]

Capturing and inferring dense full-body human-scene contact

Chun-Hao P Huang, Hongwei Yi, Markus H ¨oschle, Matvey Safroshkin, Tsvetelina Alexiadis, Senya Polikovsky, Daniel Scharstein, and Michael J Black. Capturing and inferring dense full-body human-scene contact. In Proceedings of 9 the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13274–13285, 2022. 1

work page 2022

[22] [22]

Arch: Animatable reconstruction of clothed hu- mans

Zeng Huang, Yuanlu Xu, Christoph Lassner, Hao Li, and Tony Tung. Arch: Animatable reconstruction of clothed hu- mans. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition , pages 3093–3102,

work page

[23] [23]

Interactive synthesis of human- object interaction

Sumit Jain and C Karen Liu. Interactive synthesis of human- object interaction. In Proceedings of the 2009 ACM SIG- GRAPH/Eurographics Symposium on Computer Animation, pages 47–53, 2009. 3

work page 2009

[24] [24]

Flexnerf: Photorealistic free- viewpoint rendering of moving humans from sparse views

Vinoj Jayasundara, Amit Agrawal, Nicolas Heron, Abhinav Shrivastava, and Larry S Davis. Flexnerf: Photorealistic free- viewpoint rendering of moving humans from sparse views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21118–21127, 2023. 2

work page 2023

[25] [25]

Neuralhofu- sion: Neural volumetric rendering under human-object in- teractions

Yuheng Jiang, Suyi Jiang, Guoxing Sun, Zhuo Su, Kai- wen Guo, Minye Wu, Jingyi Yu, and Lan Xu. Neuralhofu- sion: Neural volumetric rendering under human-object in- teractions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 6155– 6165, 2022. 2

work page 2022

[26] [26]

Instant-nvr: Instant neural volumetric ren- dering for human-object interactions from monocular rgbd stream

Yuheng Jiang, Kaixin Yao, Zhuo Su, Zhehao Shen, Haimin Luo, and Lan Xu. Instant-nvr: Instant neural volumetric ren- dering for human-object interactions from monocular rgbd stream. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 595–605,

work page

[27] [27]

End-to-end recovery of human shape and pose

Angjoo Kanazawa, Michael J Black, David W Jacobs, and Jitendra Malik. End-to-end recovery of human shape and pose. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 7122–7131, 2018. 4, 14

work page 2018

[28] [28]

3d gaussian splatting for real-time radiance field rendering

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph., 42(4):139–1,

work page

[29] [29]

Hugs: Human gaussian splats

Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, and Anurag Ranjan. Hugs: Human gaussian splats. In Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition , pages 505–515, 2024. 2, 3, 4

work page 2024

[30] [30]

Human action recognition and predic- tion: A survey

Yu Kong and Yun Fu. Human action recognition and predic- tion: A survey. International Journal of Computer Vision , 130(5):1366–1401, 2022. 1

work page 2022

[31] [31]

Gen- eralizable human gaussians for sparse view synthesis

Youngjoong Kwon, Baole Fang, Yixing Lu, Haoye Dong, Cheng Zhang, Francisco Vicente Carrasco, Albert Mosella- Montoro, Jianjin Xu, Shingo Takagi, Daeil Kim, et al. Gen- eralizable human gaussians for sparse view synthesis. In European Conference on Computer Vision, pages 451–468. Springer, 2025. 3

work page 2025

[32] [32]

Gp- nerf: Generalized perception nerf for context-aware 3d scene understanding

Hao Li, Dingwen Zhang, Yalun Dai, Nian Liu, Lechao Cheng, Jingfeng Li, Jingdong Wang, and Junwei Han. Gp- nerf: Generalized perception nerf for context-aware 3d scene understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21708– 21718, 2024. 3

work page 2024

[33] [33]

Task-oriented human-object interactions generation with implicit neural representations

Quanzhou Li, Jingbo Wang, Chen Change Loy, and Bo Dai. Task-oriented human-object interactions generation with implicit neural representations. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3035–3044, 2024. 1

work page 2024

[34] [34]

Para- metric model-based 3d human shape and pose estimation from multiple views

Zhongguo Li, Anders Heyden, and Magnus Oskarsson. Para- metric model-based 3d human shape and pose estimation from multiple views. In Image Analysis: 21st Scandinavian Conference, SCIA 2019, Norrk ¨oping, Sweden, June 11–13, 2019, Proceedings 21, pages 336–347. Springer, 2019. 4

work page 2019

[35] [35]

Learning implicit templates for point-based clothed human modeling

Siyou Lin, Hongwen Zhang, Zerong Zheng, Ruizhi Shao, and Yebin Liu. Learning implicit templates for point-based clothed human modeling. In European Conference on Com- puter Vision, pages 210–228. Springer, 2022. 4

work page 2022

[36] [36]

Hosnerf: Dynamic human-object-scene neural ra- diance fields from a single video

Jia-Wei Liu, Yan-Pei Cao, Tianyuan Yang, Zhongcong Xu, Jussi Keppo, Ying Shan, Xiaohu Qie, and Mike Zheng Shou. Hosnerf: Dynamic human-object-scene neural ra- diance fields from a single video. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 18483–18494, 2023. 2, 4

work page 2023

[37] [37]

Humangaus- sian: Text-driven 3d human generation with gaussian splat- ting

Xian Liu, Xiaohang Zhan, Jiaxiang Tang, Ying Shan, Gang Zeng, Dahua Lin, Xihui Liu, and Ziwei Liu. Humangaus- sian: Text-driven 3d human generation with gaussian splat- ting. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition , pages 6646–6657,

work page

[38] [38]

Revisit human-scene interaction via space occu- pancy

Xinpeng Liu, Haowen Hou, Yanchao Yang, Yong-Lu Li, and Cewu Lu. Revisit human-scene interaction via space occu- pancy. In European Conference on Computer Vision, pages 1–19. Springer, 2025. 1

work page 2025

[39] [39]

Neural rays for occlusion-aware image-based render- ing

Yuan Liu, Sida Peng, Lingjie Liu, Qianqian Wang, Peng Wang, Christian Theobalt, Xiaowei Zhou, and Wenping Wang. Neural rays for occlusion-aware image-based render- ing. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition , pages 7824–7833,

work page

[40] [40]

Citygaussian: Real-time high-quality large-scale scene rendering with gaussians

Yang Liu, Chuanchen Luo, Lue Fan, Naiyan Wang, Jun- ran Peng, and Zhaoxiang Zhang. Citygaussian: Real-time high-quality large-scale scene rendering with gaussians. In European Conference on Computer Vision, pages 265–282. Springer, 2025. 3

work page 2025

[41] [41]

Smpl: a skinned multi- person linear model

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. Smpl: a skinned multi- person linear model. ACM Transactions on Graphics (TOG), 34(6):1–16, 2015. 13

work page 2015

[42] [42]

Splatfields: Neural gaussian splats for sparse 3d and 4d re- construction

Marko Mihajlovic, Sergey Prokudin, Siyu Tang, Robert Maier, Federica Bogo, Tony Tung, and Edmond Boyer. Splatfields: Neural gaussian splats for sparse 3d and 4d re- construction. In European Conference on Computer Vision, pages 313–332. Springer, 2025. 2

work page 2025

[43] [43]

Nerf: Representing scenes as neural radiance fields for view syn- thesis

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. Communications of the ACM , 65(1):99–106, 2021. 3

work page 2021

[44] [44]

Human gaussian 10 splatting: Real-time rendering of animatable avatars

Arthur Moreau, Jifei Song, Helisa Dhamo, Richard Shaw, Yiren Zhou, and Eduardo P ´erez-Pellitero. Human gaussian 10 splatting: Real-time rendering of animatable avatars. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 788–798, 2024. 3, 4

work page 2024

[45] [45]

Instant neural graphics primitives with a mul- tiresolution hash encoding

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding. ACM transactions on graphics (TOG), 41(4):1–15, 2022. 3

work page 2022

[46] [46]

Coherentgs: Sparse novel view synthesis with coherent 3d gaussians

Avinash Paliwal, Wei Ye, Jinhui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, and Nima Khademi Kalan- tari. Coherentgs: Sparse novel view synthesis with coherent 3d gaussians. In European Conference on Computer Vision, pages 19–37. Springer, 2025. 2

work page 2025

[47] [47]

Sparse multi-view hand-object reconstruction for unseen environ- ments

Yik Lung Pang, Changjae Oh, and Andrea Cavallaro. Sparse multi-view hand-object reconstruction for unseen environ- ments. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 803–810, 2024. 1

work page 2024

[48] [48]

Ani- matable neural radiance fields for modeling dynamic human bodies

Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Xiaowei Zhou, and Hujun Bao. Ani- matable neural radiance fields for modeling dynamic human bodies. In Proceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 14314–14323, 2021. 4, 14

work page 2021

[49] [49]

Gendr: A generalized differentiable ren- derer

Felix Petersen, Bastian Goldluecke, Christian Borgelt, and Oliver Deussen. Gendr: A generalized differentiable ren- derer. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition , pages 4002–4011,

work page

[50] [50]

Manus: Markerless grasp capture using articulated 3d gaussians

Chandradeep Pokhariya, Ishaan Nikhil Shah, Angela Xing, Zekun Li, Kefan Chen, Avinash Sharma, and Srinath Srid- har. Manus: Markerless grasp capture using articulated 3d gaussians. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 2197– 2208, 2024. 2, 6, 7, 14

work page 2024

[51] [51]

3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting

Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, and Siyu Tang. 3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5020–5030, 2024. 3

work page 2024

[52] [52]

Web ar: A promising future for mobile augmented reality—state of the art, chal- lenges, and insights

Xiuquan Qiao, Pei Ren, Schahram Dustdar, Ling Liu, Huadong Ma, and Junliang Chen. Web ar: A promising future for mobile augmented reality—state of the art, chal- lenges, and insights. Proceedings of the IEEE, 107(4):651– 666, 2019. 2

work page 2019

[53] [53]

Em- bodied hands: modeling and capturing hands and bodies to- gether

Javier Romero, Dimitrios Tzionas, and Michael J Black. Em- bodied hands: modeling and capturing hands and bodies to- gether. ACM Transactions on Graphics (TOG), 36(6):1–17,

work page

[54] [54]

Em- bodied hands: Modeling and capturing hands and bodies to- gether

Javier Romero, Dimitrios Tzionas, and Michael J Black. Em- bodied hands: Modeling and capturing hands and bodies to- gether. arXiv preprint arXiv:2201.02610, 2022. 4

work page arXiv 2022

[55] [55]

Image quality assessment through fsim, ssim, mse and psnr—a comparative study

Umme Sara, Morium Akter, and Mohammad Shorif Ud- din. Image quality assessment through fsim, ssim, mse and psnr—a comparative study. Journal of Computer and Com- munications, 7(3):8–18, 2019. 14

work page 2019

[56] [56]

Structure- from-motion revisited

Johannes L Schonberger and Jan-Michael Frahm. Structure- from-motion revisited. In Proceedings of the IEEE con- ference on computer vision and pattern recognition , pages 4104–4113, 2016. 2

work page 2016

[57] [57]

Swings: sliding windows for dynamic 3d gaussian splatting

Richard Shaw, Michal Nazarczuk, Jifei Song, Arthur Moreau, Sibi Catley-Chandar, Helisa Dhamo, and Eduardo P´erez-Pellitero. Swings: sliding windows for dynamic 3d gaussian splatting. In European Conference on Computer Vision, pages 37–54. Springer, 2025. 2

work page 2025

[58] [58]

Holo- ported characters: Real-time free-viewpoint rendering of humans from sparse rgb cameras

Ashwath Shetty, Marc Habermann, Guoxing Sun, Diogo Lu- vizon, Vladislav Golyanik, and Christian Theobalt. Holo- ported characters: Real-time free-viewpoint rendering of humans from sparse rgb cameras. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1206–1215, 2024. 2

work page 2024

[59] [59]

Review of image-based rendering techniques

Harry Shum and Sing Bing Kang. Review of image-based rendering techniques. Visual Communications and Image Processing 2000, 4067:2–13, 2000. 2

work page 2000

[60] [60]

Free viewpoint video extraction, representation, coding, and rendering

Aljoscha Smolic, Karsten Mueller, Philipp Merkle, Tobias Rein, Matthias Kautzner, Peter Eisert, and Thomas Wiegand. Free viewpoint video extraction, representation, coding, and rendering. In 2004 International Conference on Image Pro- cessing, 2004. ICIP’04., pages 3287–3290. IEEE, 2004. 2

work page 2004

[61] [61]

Npc: Neural point characters from video

Shih-Yang Su, Timur Bagautdinov, and Helge Rhodin. Npc: Neural point characters from video. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 14795–14805, 2023. 2

work page 2023

[62] [62]

Neural free-viewpoint performance rendering under complex human-object interactions

Guoxing Sun, Xin Chen, Yizhang Chen, Anqi Pang, Pei Lin, Yuheng Jiang, Lan Xu, Jingyi Yu, and Jingya Wang. Neural free-viewpoint performance rendering under complex human-object interactions. In Proceedings of the 29th ACM International Conference on Multimedia, pages 4651–4660,

work page

[63] [63]

Neuralhumanfvv: Real-time neural volumetric human performance rendering using rgb cameras

Xin Suo, Yuheng Jiang, Pei Lin, Yingliang Zhang, Minye Wu, Kaiwen Guo, and Lan Xu. Neuralhumanfvv: Real-time neural volumetric human performance rendering using rgb cameras. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6226–6237,

work page

[64] [64]

Grab: A dataset of whole-body human grasp- ing of objects

Omid Taheri, Nima Ghorbani, Michael J Black, and Dim- itrios Tzionas. Grab: A dataset of whole-body human grasp- ing of objects. In Computer Vision–ECCV 2020: 16th Eu- ropean Conference, Glasgow, UK, August 23–28, 2020, Pro- ceedings, Part IV 16, pages 581–600. Springer, 2020. 5

work page 2020

[65] [65]

Neurad: Neural rendering for autonomous driving

Adam Tonderski, Carl Lindstr ¨om, Georg Hess, William Ljungbergh, Lennart Svensson, and Christoffer Petersson. Neurad: Neural rendering for autonomous driving. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14895–14904, 2024. 1

work page 2024

[66] [66]

Deco: Dense estimation of 3d human-scene contact in the wild

Shashank Tripathi, Agniv Chatterjee, Jean-Claude Passy, Hongwei Yi, Dimitrios Tzionas, and Michael J Black. Deco: Dense estimation of 3d human-scene contact in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8001–8013, 2023. 5

work page 2023

[67] [67]

Ibr- net: Learning multi-view image-based rendering

Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul P Srinivasan, Howard Zhou, Jonathan T Barron, Ricardo Martin-Brualla, Noah Snavely, and Thomas Funkhouser. Ibr- net: Learning multi-view image-based rendering. In Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4690–4699, 2021. 7, 14 11

work page 2021

[68] [68]

Image quality assessment: from error visibility to structural similarity

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004. 6, 14

work page 2004

[69] [69]

Hu- mannerf: Free-viewpoint rendering of moving people from monocular video

Chung-Yi Weng, Brian Curless, Pratul P Srinivasan, Jonathan T Barron, and Ira Kemelmacher-Shlizerman. Hu- mannerf: Free-viewpoint rendering of moving people from monocular video. In Proceedings of the IEEE/CVF con- ference on computer vision and pattern Recognition , pages 16210–16220, 2022. 2

work page 2022

[70] [70]

Differentiable render- ing of parametric geometry

Markus Worchel and Marc Alexa. Differentiable render- ing of parametric geometry. ACM Transactions on Graphics (TOG), 42(6):1–18, 2023. 3

work page 2023

[71] [71]

Object- compositional neural implicit surfaces

Qianyi Wu, Xian Liu, Yuedong Chen, Kejie Li, Chuanxia Zheng, Jianfei Cai, and Jianmin Zheng. Object- compositional neural implicit surfaces. In European Con- ference on Computer Vision, pages 197–213. Springer, 2022. 3

work page 2022

[72] [72]

Space-time neural irradiance fields for free-viewpoint video

Wenqi Xian, Jia-Bin Huang, Johannes Kopf, and Changil Kim. Space-time neural irradiance fields for free-viewpoint video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9421–9431,

work page

[73] [73]

Relightable and animatable neural avatar from sparse-view video

Zhen Xu, Sida Peng, Chen Geng, Linzhan Mou, Zihan Yan, Jiaming Sun, Hujun Bao, and Xiaowei Zhou. Relightable and animatable neural avatar from sparse-view video. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 990–1000, 2024. 3

work page 2024

[74] [74]

Neural render- ing in a room: amodal 3d understanding and free-viewpoint rendering for the closed scene composed of pre-captured ob- jects

Bangbang Yang, Yinda Zhang, Yijin Li, Zhaopeng Cui, Sean Fanello, Hujun Bao, and Guofeng Zhang. Neural render- ing in a room: amodal 3d understanding and free-viewpoint rendering for the closed scene composed of pre-captured ob- jects. ACM Transactions on Graphics (TOG) , 41(4):1–10,

work page

[75] [75]

Neural- dome: A neural modeling pipeline on multi-view human- object interactions

Juze Zhang, Haimin Luo, Hongdi Yang, Xinru Xu, Qianyang Wu, Ye Shi, Jingyi Yu, Lan Xu, and Jingya Wang. Neural- dome: A neural modeling pipeline on multi-view human- object interactions. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 8834–8845, 2023. 2, 5, 6, 7, 14

work page 2023

[76] [76]

Hoi-mˆ 3: Capture multiple humans and objects in- teraction within contextual environment

Juze Zhang, Jingyan Zhang, Zining Song, Zhanhe Shi, Chengfeng Zhao, Ye Shi, Jingyi Yu, Lan Xu, and Jingya Wang. Hoi-mˆ 3: Capture multiple humans and objects in- teraction within contextual environment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 516–526, 2024. 3

work page 2024

[77] [77]

Cor-gs: sparse-view 3d gaussian splatting via co-regularization

Jiawei Zhang, Jiahe Li, Xiaohan Yu, Lei Huang, Lin Gu, Jin Zheng, and Xiao Bai. Cor-gs: sparse-view 3d gaussian splatting via co-regularization. In European Conference on Computer Vision, pages 335–352. Springer, 2025. 2, 3

work page 2025

[78] [78]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 6, 14

work page 2018

[79] [79]

I’m hoi: Inertia-aware monocular capture of 3d human-object interac- tions

Chengfeng Zhao, Juze Zhang, Jiashen Du, Ziwei Shan, Junye Wang, Jingyi Yu, Jingya Wang, and Lan Xu. I’m hoi: Inertia-aware monocular capture of 3d human-object interac- tions. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 729–741, 2024. 1

work page 2024

[80] [80]

In-place scene labelling and understanding with implicit scene representation

Shuaifeng Zhi, Tristan Laidlow, Stefan Leutenegger, and An- drew J Davison. In-place scene labelling and understanding with implicit scene representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 15838–15847, 2021. 3

work page 2021