Recognition: no theorem link
VRGaussianAvatar: Integrating 3D Gaussian Avatars into VR
Pith reviewed 2026-05-16 08:49 UTC · model grok-4.3
The pith
A system renders real-time full-body 3D Gaussian avatars in VR from head-mounted display tracking alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
VRGaussianAvatar enables real-time full-body 3D Gaussian Splatting avatars in virtual reality by running a parallel pipeline in which a VR frontend estimates full-body pose from head-mounted display tracking with inverse kinematics and streams the result to a GA backend that performs stereoscopic rendering with binocular batching, yielding higher perceived appearance similarity, embodiment, and plausibility than image- or video-based mesh avatar baselines in quantitative tests and a within-subject user study.
What carries the argument
The parallel VR-frontend-plus-GA-backend pipeline, where inverse kinematics converts head-mounted display signals into full-body poses and binocular batching jointly renders left and right eye views of a single-image 3D Gaussian Splatting avatar to cut redundant computation.
Load-bearing premise
Inverse kinematics from head-mounted display tracking alone produces accurate and natural full-body poses and single-image 3D Gaussian Splatting reconstruction produces avatars of adequate visual quality under VR viewing conditions.
What would settle it
A replication of the within-subject user study in which participants show no statistically significant improvement in embodiment or plausibility scores over the mesh avatar baselines would falsify the perceptual advantage claim.
Figures
read the original abstract
We present VRGaussianAvatar, an integrated system that enables real-time full-body 3D Gaussian Splatting (3DGS) avatars in virtual reality using only head-mounted display (HMD) tracking signals. The system adopts a parallel pipeline with a VR Frontend and a GA Backend. The VR Frontend uses inverse kinematics to estimate full-body pose and streams the resulting pose along with stereo camera parameters to the backend. The GA Backend stereoscopically renders a 3DGS avatar reconstructed from a single image. To improve stereo rendering efficiency, we introduce Binocular Batching, which jointly processes left and right eye views in a single batched pass to reduce redundant computation and support high-resolution VR displays. We evaluate VRGaussianAvatar with quantitative performance tests and a within-subject user study against image- and video-based mesh avatar baselines. Results show that VRGaussianAvatar sustains interactive VR performance and yields higher perceived appearance similarity, embodiment, and plausibility. Project page and source code are available at https://vrgaussianavatar.github.io.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents VRGaussianAvatar, a system for real-time full-body 3D Gaussian Splatting avatars in VR using only HMD tracking signals. It consists of a VR Frontend applying inverse kinematics to estimate full-body poses from head and controller data, streamed to a GA Backend that performs stereoscopic rendering of a single-image reconstructed 3DGS avatar. A novel Binocular Batching technique is introduced to jointly process left and right eye views for efficiency on high-resolution VR displays. The system is evaluated via quantitative performance tests and a within-subject user study against image- and video-based mesh avatar baselines, claiming interactive VR frame rates and superior perceived appearance similarity, embodiment, and plausibility.
Significance. If the performance and user-study claims hold under rigorous validation, the work would represent a meaningful practical advance in immersive VR by enabling high-fidelity, real-time 3DGS avatars with minimal tracking hardware. The parallel pipeline design and Binocular Batching optimization address concrete rendering bottlenecks in stereo VR, and the open release of code and project page supports reproducibility. The contribution is primarily engineering-oriented rather than theoretical, with potential impact on VR applications if the untested IK pose quality assumption is confirmed.
major comments (3)
- [VR Frontend] VR Frontend section: The inverse kinematics method for full-body pose estimation from HMD and controller signals alone receives no quantitative validation (e.g., MPJPE against ground-truth MoCap, foot-sliding metrics, or perceptual pose naturalness ratings). This directly undermines the central user-study claims of improved embodiment and plausibility, as the skeptic analysis correctly identifies that standard IK solvers often produce unnatural lower-body configurations under upper-body constraints only.
- [Evaluation] Evaluation section: The within-subject user study and quantitative performance tests report higher scores for appearance similarity, embodiment, and plausibility but provide no participant count, statistical test details (e.g., p-values, effect sizes), exact metric definitions, or exclusion criteria. Without these, the superiority claims over mesh baselines cannot be independently verified and the soundness rating remains low.
- [GA Backend] Binocular Batching description: The technique is presented as jointly processing stereo views to reduce redundant computation, yet no ablation results, timing breakdowns, or comparisons (e.g., FPS with vs. without batching at target VR resolutions) are supplied to quantify the efficiency gain or confirm it sustains interactive performance.
minor comments (2)
- [Abstract] The abstract states positive outcomes but omits any numerical values for frame rates, latency, or user-study scores, reducing immediate clarity.
- [System Overview] Notation for stereo camera parameters and pose streaming between frontend and backend could be formalized with a diagram or pseudocode for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We have carefully addressed each major comment below and revised the manuscript to incorporate additional validation, methodological details, and quantitative results where the original submission was lacking.
read point-by-point responses
-
Referee: [VR Frontend] VR Frontend section: The inverse kinematics method for full-body pose estimation from HMD and controller signals alone receives no quantitative validation (e.g., MPJPE against ground-truth MoCap, foot-sliding metrics, or perceptual pose naturalness ratings). This directly undermines the central user-study claims of improved embodiment and plausibility, as the skeptic analysis correctly identifies that standard IK solvers often produce unnatural lower-body configurations under upper-body constraints only.
Authors: We acknowledge that the original manuscript did not include quantitative validation of the IK solver. To address this, the revised version adds a dedicated evaluation subsection using a public MoCap dataset to report MPJPE and foot-sliding metrics for upper-body-constrained poses. These results support the plausibility of the poses driving the user-study comparisons, while noting that the IK component follows established VR practices. revision: yes
-
Referee: [Evaluation] Evaluation section: The within-subject user study and quantitative performance tests report higher scores for appearance similarity, embodiment, and plausibility but provide no participant count, statistical test details (e.g., p-values, effect sizes), exact metric definitions, or exclusion criteria. Without these, the superiority claims over mesh baselines cannot be independently verified and the soundness rating remains low.
Authors: We agree the original Evaluation section omitted key methodological details. The revised manuscript now specifies the participant count (N=12), reports paired t-test results with p-values and effect sizes, provides exact Likert-scale metric definitions, and lists exclusion criteria. These additions enable independent verification of the reported advantages in appearance similarity, embodiment, and plausibility. revision: yes
-
Referee: [GA Backend] Binocular Batching description: The technique is presented as jointly processing stereo views to reduce redundant computation, yet no ablation results, timing breakdowns, or comparisons (e.g., FPS with vs. without batching at target VR resolutions) are supplied to quantify the efficiency gain or confirm it sustains interactive performance.
Authors: We thank the referee for noting the absence of quantitative support for Binocular Batching. The revised Evaluation section includes new ablation studies with timing breakdowns and direct FPS comparisons (with vs. without batching) at target VR resolutions such as 1832x1920 per eye. These results confirm the efficiency improvements while maintaining interactive frame rates. revision: yes
Circularity Check
No significant circularity in the system pipeline.
full rationale
The paper describes a forward engineering pipeline: HMD signals feed an IK solver in the VR Frontend to produce full-body poses, which are then streamed to a GA Backend that renders a single-image 3DGS avatar using the newly introduced Binocular Batching optimization. No equations, fitted parameters, or predictions are defined in terms of the final results; the user-study and performance metrics are external measurements rather than quantities that reduce to the inputs by construction. No load-bearing self-citations or uniqueness theorems appear in the provided text.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Inverse kinematics accurately estimates full-body pose from HMD head tracking
- domain assumption 3D Gaussian Splatting supports real-time stereoscopic avatar rendering
invented entities (1)
-
Binocular Batching
no independent evidence
Reference graph
Works this paper leans on
-
[1]
T. Alldieck, M. Magnor, W. Xu, C. Theobalt, and G. Pons-Moll. Video based reconstruction of 3d people models. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8387–8397,
-
[2]
S. Benford, J. Bowers, L. E. Fahlén, C. Greenhalgh, and D. Snowdon. User embodiment in collaborative virtual environments. InProceedings of the SIGCHI conference on Human factors in computing systems, pp. 242–249, 1995. 5
work page 1995
-
[3]
Z. Cai, W. Yin, A. Zeng, C. Wei, Q. Sun, W. Yanjun, H. E. Pang, H. Mei, M. Zhang, L. Zhang, et al. Smpler-x: Scaling up expressive human pose and shape estimation.Advances in Neural Information Processing Systems, 36:11454–11468, 2023. 1, 4, 5, 6, 8
work page 2023
-
[4]
A. Chatziagapi, B. Chaudhuri, A. Kumar, R. Ranjan, D. Samaras, and N. Sarafianos. Talkinnerf: Animatable neural fields for full-body talking humans. InEuropean Conference on Computer Vision, pp. 148–166. Springer, 2024. 1, 2
work page 2024
- [5]
- [6]
-
[7]
T. D. Do, C. I. Protko, and R. P. McMahan. Stepping into the right shoes: The effects of user-matched avatar ethnicity and gender on sense of embodiment in virtual reality.IEEE Transactions on Visualization and Computer Graphics, 30(5):2434–2443, 2024. 5
work page 2024
-
[8]
P. Ekman and W. V . Friesen. Facial action coding system.Environmental Psychology & Nonverbal Behavior, 1978. 5, 6
work page 1978
-
[9]
L. A. Elkin, M. Kay, J. J. Higgins, and J. O. Wobbrock. An aligned rank transform procedure for multifactor contrast tests. InThe 34th annual ACM symposium on user interface software and technology, pp. 754–768,
-
[10]
Y . Feng, H. Feng, M. J. Black, and T. Bolkart. Learning an animatable detailed 3d face model from in-the-wild images.ACM Transactions on Graphics (ToG), 40(4):1–13, 2021. 1
work page 2021
-
[11]
M. L. Fiedler, E. Wolf, N. Döllinger, M. Botsch, M. E. Latoschik, and C. Wienrich. Embodiment and personalization for self-identification with virtual humans. In2023 IEEE conference on virtual reality and 3D user interfaces abstracts and workshops (VRW), pp. 799–800. IEEE, 2023. 5
work page 2023
-
[12]
X. Gao, C. Zhong, J. Xiang, Y . Hong, Y . Guo, and J. Zhang. Reconstructing personalized semantic facial nerf models from monocular video.ACM Transactions on Graphics (TOG), 41(6):1–12, 2022. 1
work page 2022
-
[13]
Y . He, X. Gu, X. Ye, C. Xu, Z. Zhao, Y . Dong, W. Yuan, Z. Dong, and L. Bo. Lam: Large avatar model for one-shot animatable gaussian head. InProceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers, pp. 1–13, 2025. 1
work page 2025
-
[14]
S. Hu, F. Hong, L. Pan, H. Mei, L. Yang, and Z. Liu. Sherf: Generaliz- able human nerf from a single image. InProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9352–9364, 2023. 2
work page 2023
-
[15]
Q. Huynh-Thu and M. Ghanbari. Scope of validity of psnr in image/video quality assessment.Electronics letters, 44(13):800–801, 2008. 6
work page 2008
-
[16]
F. Iandola, S. Pidhorskyi, I. Santesteban, D. Gupta, A. Pahuja, N. Bar- tolovic, F. Yu, E. Garbin, T. Simon, and S. Saito. Squeezeme: Mobile- ready distillation of gaussian full-body avatars. InProceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers, pp. 1–11, 2025. 2, 9
work page 2025
- [17]
- [18]
-
[19]
J.-N. Kaiser, S. Kimmel, E. Licht, E. Landwehr, F. Hemmert, and W. Heuten. Get real with me: Effects of avatar realism on social presence and comfort in augmented reality remote collaboration and self-disclosure. InProceedings of the 2025 CHI Conference on Human Factors in Com- puting Systems, pp. 1–18, 2025. 2
work page 2025
-
[20]
A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik. End-to-end recovery of human shape and pose. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. 1
work page 2018
- [21]
-
[22]
H. Kim and I.-K. Lee. Is 3dgs useful?: Comparing the effectiveness of re- cent reconstruction methods in vr. In2024 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 71–80. IEEE, 2024. 1
work page 2024
-
[23]
M. Kocabas, N. Athanasiou, and M. J. Black. Vibe: Video inference for human body pose and shape estimation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5253–5263,
-
[24]
Y .-C. Lee, Y .-T. Chen, A. Wang, T.-H. Liao, B. Y . Feng, and J.-B. Huang. Vividdream: Generating 3d scene with ambient dynamics.arXiv preprint arXiv:2405.20334, 2024. 1
- [25]
-
[26]
T. Li, T. Bolkart, M. J. Black, H. Li, and J. Romero. Learning a model of facial shape and expression from 4d scans.ACM Trans. Graph., 36(6):194– 1, 2017. 5
work page 2017
- [27]
-
[28]
Y . Liu, J. Zhu, J. Tang, S. Zhang, J. Zhang, W. Cao, C. Wang, Y . Wu, and D. Huang. Texdreamer: Towards zero-shot high-fidelity 3d human texture generation. InEuropean conference on computer vision, pp. 184–202. Springer, 2024. 2, 4, 5, 6, 8
work page 2024
-
[29]
D. Mal, E. Wolf, N. Döllinger, M. Botsch, C. Wienrich, and M. E. Latoschik. Virtual human coherence and plausibility–towards a validated scale. In2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pp. 788–789. IEEE, 2022. 5
work page 2022
-
[30]
D. Mal, E. Wolf, N. Döllinger, M. Botsch, C. Wienrich, and M. E. Latoschik. From 2d-screens to vr: Exploring the effect of immersion on the plausibility of virtual humans. InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems, pp. 1–8, 2024. 5
work page 2024
-
[31]
D. Mal, E. Wolf, N. Döllinger, C. Wienrich, and M. E. Latoschik. The impact of avatar and environment congruence on plausibility, embodiment, presence, and the proteus effect in virtual reality.IEEE Transactions on Visualization and Computer Graphics, 29(5):2358–2368, 2023. 5
work page 2023
-
[32]
Y . Men, B. Lei, Y . Yao, M. Cui, Z. Lian, and X. Xie. En3d: An enhanced generative model for sculpting 3d humans from 2d synthetic data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9981–9991, June 2024. 2
work page 2024
- [33]
-
[34]
B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. Nerf: Representing scenes as neural radiance fields for view synthesis.Communications of the ACM, 65(1):99–106, 2021. 1, 2
work page 2021
-
[35]
G. Moon, T. Shiratori, and S. Saito. Expressive whole-body 3d gaussian avatar. InEuropean Conference on Computer Vision, pp. 19–35. Springer,
- [36]
- [37]
-
[38]
G. Pavlakos, V . Choutas, N. Ghorbani, T. Bolkart, A. A. Osman, D. Tzionas, and M. J. Black. Expressive body capture: 3d hands, face, and body from a single image. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10975–10985, 2019. 1, 2, 5
work page 2019
-
[39]
S. Peng, W. Xie, Z. Wang, X. Guo, Z. Chen, B. Yang, and X. Dong. Rmavatar: Photorealistic human avatar reconstruction from monocular video based on rectified mesh-embedded gaussians.Graphical Models, 139:101266, 2025. 2
work page 2025
- [40]
-
[41]
Z. Qian, S. Wang, M. Mihajlovic, A. Geiger, and S. Tang. 3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting. 2024. 2
work page 2024
- [42]
- [43]
-
[44]
D. Roth and M. E. Latoschik. Construction of the virtual embodiment questionnaire (veq).IEEE Transactions on Visualization and Computer Graphics, 26(12):3546–3556, 2020. 5
work page 2020
- [45]
-
[46]
H. Song. Toward realistic 3d avatar generation with dynamic 3d gaussian splatting for ar/vr communication. In2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pp. 869–
-
[47]
H. Song, S. Yang, and W. Woo. Fast texture transfer for xr avatars via barycentric uv conversion. In2025 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pp. 739–740. IEEE, 2025. 4, 5, 6, 8
work page 2025
-
[48]
H. Song, B. Yoon, W. Cho, and W. Woo. Rc-smpl: Real-time cumulative smpl-based avatar body generation. In2023 IEEE International Sympo- sium on Mixed and Augmented Reality (ISMAR), pp. 89–98. IEEE, 2023. 1, 2, 4, 5, 6, 8
work page 2023
-
[49]
J. Tang, Z. Chen, X. Chen, T. Wang, G. Zeng, and Z. Liu. Lgm: Large multi-view gaussian model for high-resolution 3d content creation. In European Conference on Computer Vision, pp. 1–18. Springer, 2024. 1
work page 2024
-
[50]
J. Tang, J. Ren, H. Zhou, Z. Liu, and G. Zeng. Dreamgaussian: Gener- ative gaussian splatting for efficient 3d content creation.arXiv preprint arXiv:2309.16653, 2023. 1
work page internal anchor Pith review arXiv 2023
- [51]
-
[52]
P. Tran, E. Zakharov, L.-N. Ho, A. T. Tran, L. Hu, and H. Li. V oodoo 3d: volumetric portrait disentanglement for one-shot 3d head reenactment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10336–10348, 2024. 1, 2
work page 2024
-
[53]
X. Tu, L. Radl, M. Steiner, M. Steinberger, B. Kerbl, and F. de la Torre. Vrsplat: Fast and robust gaussian splatting for virtual reality.Proceedings of the ACM on Computer Graphics and Interactive Techniques, 8(1):1–22,
-
[54]
T. Waltemate, D. Gall, D. Roth, M. Botsch, and M. E. Latoschik. The impact of avatar personalization and immersion on virtual body ownership, presence, and emotional response.IEEE transactions on visualization and computer graphics, 24(4):1643–1652, 2018. 5
work page 2018
-
[55]
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004. 6
work page 2004
-
[56]
F. Weidner, J. Orlosky, S. Yoshimoto, K. Kiyokawa, M. Speicher, D. Saredakis, and B. Fröhlich. A systematic review on the visualiza- tion of avatars and agents in AR & VR displayed using head-mounted displays.IEEE Transactions on Visualization and Computer Graphics, 29(5):2596–2606, 2023. doi: 10.1109/TVCG.2021.3134637 2
-
[57]
J. O. Wobbrock, L. Findlater, D. Gergle, and J. J. Higgins. The aligned rank transform for nonparametric factorial analyses using only anova procedures. InProceedings of the SIGCHI conference on human factors in computing systems, pp. 143–146, 2011. 6
work page 2011
-
[58]
E. Wolf, M. L. Fiedler, N. Döllinger, C. Wienrich, and M. E. Latoschik. Exploring presence, avatar embodiment, and body perception with a holo- graphic augmented reality mirror. In2022 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 350–359. IEEE, 2022. 5
work page 2022
- [59]
-
[60]
B. Yoon, H.-i. Kim, G. A. Lee, M. Billinghurst, and W. Woo. The effect of avatar appearance on social presence in an augmented reality remote collaboration. In2019 IEEE conference on virtual reality and 3D user interfaces (VR), pp. 547–556. IEEE, 2019. 2
work page 2019
-
[61]
Z. Yu, W. Cheng, X. Liu, W. Wu, and K.-Y . Lin. Monohuman: Animat- able human neural field from monocular video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16943–16953, 2023. 2
work page 2023
- [62]
- [63]
- [64]
- [65]
- [66]
-
[67]
T. Zhi, C. Lassner, T. Tung, C. Stoll, S. G. Narasimhan, and M. V o. Texmesh: Reconstructing detailed human texture and geometry from rgb- d video. InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16, pp. 492–509. Springer, 2020. 2 VRGaussianA vatar: Integrating 3D Gaussian Splatting A vatars into...
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.