High-Fidelity Mobile Avatars with Pruned Local Blendshapes
Pith reviewed 2026-05-10 14:48 UTC · model grok-4.3
The pith
Pruned local blendshapes on nearby Gaussians let mobile devices render detailed 2K human avatars at 120 FPS.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that local linear blendshapes applied to small body parts can capture global nonlinear pose-dependent changes in Gaussian attributes more accurately than global nonlinear methods because nearby Gaussians are highly correlated within local regions, and that pruning blendshapes for Gaussians whose attributes change little produces a minimal representation sufficient for high-fidelity rendering.
What carries the argument
pruned local blendshapes that linearly combine pose features within small body regions to model correlated changes in nearby Gaussian attributes
If this is right
- Avatars achieve 120 FPS at 2K resolution on mobile hardware while retaining better fine details than prior distilled global methods.
- The representation runs on multiple devices through a WebGPU implementation without needing a pretrained model.
- Model size shrinks because blendshapes are removed for Gaussians with minimal attribute variation.
- End-to-end training from multi-view video produces the avatars directly.
Where Pith is reading between the lines
- The same local-correlation pruning might reduce compute for other dynamic 3D elements such as loose clothing or facial expressions.
- Smaller per-avatar storage could let apps host libraries of many characters on-device for instant switching.
- The efficiency gain may support extended real-time sessions or modestly higher resolutions in mobile VR without overheating.
- The assumption of local linearity would be tested by measuring error on extreme poses or diverse body shapes where correlations might break.
Load-bearing premise
Nearby Gaussians within a local body region are highly correlated so their pose-dependent attribute changes can be modeled linearly with less error than global nonlinear combinations.
What would settle it
A side-by-side test on a new multi-view video dataset where the local pruned method produces visibly lower detail or requires more than 8 ms per frame at 2K resolution compared with a global nonlinear baseline at matched model size.
Figures
read the original abstract
We propose a method to reconstruct high-fidelity human avatars from multi-view video that can run on mobile devices. Many works can model high-quality Gaussian-based full-body avatars from multi-view video. However, these methods require heavy computation to obtain pose-dependent appearance, making deployment on mobile devices very difficult. Recent methods distill from pretrained models and model pose-dependent nonlinear Gaussian attributes by linearly combining global pose features with blendshapes. Although they can run on mobile devices, they suffer some loss of detail. We observe that nearby Gaussians are often highly correlated within a local region of the body, and can be linearly modeled with less error. Therefore, we use local linear blendshapes in small body parts to capture global nonlinear changes of Gaussian attributes. To further reduce computation and model size, we propose to remove blendshapes for Gaussians whose attributes change little, yielding a minimal blendshape representation. Our method is an end-to-end training method without a pretrained model. To make it run on multiple devices, we implement our method using WebGPU. Experiments show that our method can render high-quality human avatars with better details, and can reach 120 FPS at 2K resolution on mobile devices.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes reconstructing high-fidelity full-body human avatars from multi-view video using 3D Gaussian splatting augmented with pruned local blendshapes. It partitions the body into small regions, applies local linear blendshapes to capture pose-dependent nonlinear changes in Gaussian attributes (position, color, opacity, scale, rotation), prunes blendshapes for Gaussians with low attribute variation, trains the entire pipeline end-to-end without a pretrained teacher model, and implements the renderer in WebGPU to target mobile devices. The central experimental claim is that this yields higher visual detail than prior mobile distillation methods while achieving 120 FPS at 2K resolution.
Significance. If the quality and speed claims are substantiated, the work would meaningfully advance practical deployment of detailed 3D avatars on consumer mobile hardware for AR/VR and telepresence. The combination of local linear modeling, explicit pruning, end-to-end training, and WebGPU implementation addresses a clear deployment gap between high-quality desktop Gaussian avatars and lighter mobile alternatives.
major comments (2)
- [§4 Experiments, §3.2] §4 Experiments and §3.2 Local Blendshape Modeling: the abstract and introduction assert 'better details' and 120 FPS at 2K on mobile, yet the manuscript provides no quantitative metrics (PSNR, SSIM, LPIPS, or perceptual user studies), no baseline tables, and no error analysis or ablation on local vs. global modeling error. These omissions are load-bearing for the central claim that local linear blendshapes reduce modeling error relative to global nonlinear methods.
- [§3.3 Pruning] §3.3 Pruning and §3.1: the blendshape pruning threshold is listed as a free hyperparameter with no sensitivity analysis, no reported trade-off curves between pruned model size/FPS and reconstruction error, and no justification that the chosen threshold generalizes across subjects or poses. This directly affects the 'minimal blendshape representation' and mobile performance claims.
minor comments (2)
- [§3.1] Notation for local blendshape coefficients and region partitioning is introduced without an explicit equation or diagram showing how Gaussians are assigned to body parts.
- [§4.3 Implementation] WebGPU implementation details (shader structure, memory layout for pruned blendshapes) are mentioned but not accompanied by pseudocode or performance breakdown by stage.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major point below and have revised the manuscript to strengthen the experimental validation of our claims.
read point-by-point responses
-
Referee: [§4 Experiments, §3.2] §4 Experiments and §3.2 Local Blendshape Modeling: the abstract and introduction assert 'better details' and 120 FPS at 2K on mobile, yet the manuscript provides no quantitative metrics (PSNR, SSIM, LPIPS, or perceptual user studies), no baseline tables, and no error analysis or ablation on local vs. global modeling error. These omissions are load-bearing for the central claim that local linear blendshapes reduce modeling error relative to global nonlinear methods.
Authors: We agree that quantitative metrics and ablations are necessary to fully support the central claims. In the revised manuscript we will add a results table reporting PSNR, SSIM, and LPIPS against the relevant baselines (including global-blendshape variants), together with an explicit ablation comparing local versus global linear modeling error on the same Gaussian attributes. Runtime measurements confirming 120 FPS at 2K on the target mobile hardware will also be tabulated with per-component breakdowns. revision: yes
-
Referee: [§3.3 Pruning] §3.3 Pruning and §3.1: the blendshape pruning threshold is listed as a free hyperparameter with no sensitivity analysis, no reported trade-off curves between pruned model size/FPS and reconstruction error, and no justification that the chosen threshold generalizes across subjects or poses. This directly affects the 'minimal blendshape representation' and mobile performance claims.
Authors: We accept that additional analysis is required. The revised §3.3 will contain sensitivity plots and trade-off curves relating pruning threshold to model size, FPS, and reconstruction error (using the quantitative metrics above). We will further evaluate the selected threshold on multiple subjects and varied pose sequences to demonstrate generalization. revision: yes
Circularity Check
No significant circularity detected
full rationale
The provided abstract and description present the method as an end-to-end trained pipeline that adopts local linear blendshapes based on an empirical observation about Gaussian correlations in body regions, followed by pruning and WebGPU implementation. No equations, self-citations, or derivations are quoted that reduce outputs to inputs by construction, rename known results, or import uniqueness from prior author work. The core claims rest on design choices and reported performance rather than any load-bearing self-referential step.
Axiom & Free-Parameter Ledger
free parameters (1)
- blendshape pruning threshold
axioms (1)
- domain assumption Nearby Gaussians within a local body region are highly correlated and can be linearly modeled with less error
Reference graph
Works this paper leans on
-
[1]
Scaffoldavatar: High-fidelity gaussian avatars with patch expressions
Shivangi Aneja, Sebastian Weiss, Irene Baeza, Prashanth Chandran, Gaspard Zoss, Matthias Nießner, and Derek Bradley. Scaffoldavatar: High-fidelity gaussian avatars with patch expressions. InProceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers, pages 1–11, 2025. 3
work page 2025
-
[2]
Driving-signal aware full-body avatars.ACM Transactions on Graphics (TOG), 40(4):1–17,
Timur Bagautdinov, Chenglei Wu, Tomas Simon, Fabi ´an Prada, Takaaki Shiratori, Shih-En Wei, Weipeng Xu, Yaser Sheikh, and Jason Saragih. Driving-signal aware full-body avatars.ACM Transactions on Graphics (TOG), 40(4):1–17,
-
[3]
Morf: Mobile realistic fullbody avatars from a monocular video
Renat Bashirov, Alexey Larionov, Evgeniya Ustinova, Mikhail Sidorenko, David Svitov, Ilya Zakharkin, and Vic- tor Lempitsky. Morf: Mobile realistic fullbody avatars from a monocular video. InProceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision, pages 3545–3555, 2024. 3
work page 2024
-
[4]
Detailed full-body reconstructions of moving peo- ple from monocular rgb-d sequences
Federica Bogo, Michael J Black, Matthew Loper, and Javier Romero. Detailed full-body reconstructions of moving peo- ple from monocular rgb-d sequences. InProceedings of the IEEE international conference on computer vision, pages 2300–2308, 2015. 2
work page 2015
-
[5]
Multilin- ear wavelets: A statistical shape space for human faces
Alan Brunton, Timo Bolkart, and Stefanie Wuhrer. Multilin- ear wavelets: A statistical shape space for human faces. In European Conference on Computer Vision, pages 297–312. Springer, 2014. 3
work page 2014
-
[6]
Jianchuan Chen, Jingchuan Hu, Gaige Wang, Zhonghua Jiang, Tiansong Zhou, Zhiwen Chen, and Chengfei Lv. Taoa- vatar: Real-time lifelike full-body talking avatars for aug- mented reality via 3d gaussian splatting. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 10723–10734, 2025. 1, 3, 4, 5, 6, 7, 13
work page 2025
-
[7]
Uv volumes for real-time rendering of editable free-view human performance
Yue Chen, Xuan Wang, Xingyu Chen, Qi Zhang, Xiaoyu Li, Yu Guo, Jue Wang, and Fei Wang. Uv volumes for real-time rendering of editable free-view human performance. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16621–16631, 2023. 3
work page 2023
-
[8]
Meshavatar: Learning high-quality triangular human avatars from multi-view videos
Yushuo Chen, Zerong Zheng, Zhe Li, Chao Xu, and Yebin Liu. Meshavatar: Learning high-quality triangular human avatars from multi-view videos. InEuropean Conference on Computer Vision, pages 250–269. Springer, 2024. 3
work page 2024
-
[9]
High-quality streamable free-viewpoint video.ACM Transactions on Graphics (ToG), 34(4):1–13,
Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Den- nis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. High-quality streamable free-viewpoint video.ACM Transactions on Graphics (ToG), 34(4):1–13,
-
[10]
4d gaussian videos with motion lay- ering.ACM Transactions on Graphics (TOG), 44(4):1–14,
Pinxuan Dai, Peiquan Zhang, Zheng Dong, Ke Xu, Yifan Peng, Dandan Ding, Yujun Shen, Yin Yang, Xinguo Liu, Rynson WH Lau, et al. 4d gaussian videos with motion lay- ering.ACM Transactions on Graphics (TOG), 44(4):1–14,
-
[11]
Ram-avatar: Real-time photo-realistic avatar from monoc- ular videos with full-body control
Xiang Deng, Zerong Zheng, Yuxiang Zhang, Jingxiang Sun, Chao Xu, Xiaodong Yang, Lizhen Wang, and Yebin Liu. Ram-avatar: Real-time photo-realistic avatar from monoc- ular videos with full-body control. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1996–2007, 2024. 3
work page 1996
-
[12]
Learning neural volumetric representations of dy- namic humans in minutes
Chen Geng, Sida Peng, Zhen Xu, Hujun Bao, and Xiaowei Zhou. Learning neural volumetric representations of dy- namic humans in minutes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8759–8770, 2023. 3
work page 2023
-
[13]
Vid2avatar: 3d avatar reconstruction from videos in the wild via self-supervised scene decomposition
Chen Guo, Tianjian Jiang, Xu Chen, Jie Song, and Otmar Hilliges. Vid2avatar: 3d avatar reconstruction from videos in the wild via self-supervised scene decomposition. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12858–12868, 2023. 3
work page 2023
-
[14]
Vid2avatar-pro: Authentic avatar from videos in the wild via universal prior
Chen Guo, Junxuan Li, Yash Kant, Yaser Sheikh, Shunsuke Saito, and Chen Cao. Vid2avatar-pro: Authentic avatar from videos in the wild via universal prior. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5559–5570, 2025. 3
work page 2025
-
[15]
Marc Habermann, Weipeng Xu, Michael Zollhoefer, Ger- ard Pons-Moll, and Christian Theobalt. Livecap: Real-time human performance capture from monocular video.ACM Transactions On Graphics (TOG), 38(2):1–17, 2019. 2
work page 2019
-
[16]
Real-time deep dynamic characters.ACM Transactions on Graphics (ToG), 40(4):1–16, 2021
Marc Habermann, Lingjie Liu, Weipeng Xu, Michael Zoll- hoefer, Gerard Pons-Moll, and Christian Theobalt. Real-time deep dynamic characters.ACM Transactions on Graphics (ToG), 40(4):1–16, 2021. 3
work page 2021
-
[17]
Hezhen Hu, Zhiwen Fan, Tianhao Wu, Yihan Xi, Seoyoung Lee, Georgios Pavlakos, Zhangyang Wang, et al. Expres- sive gaussian human avatars from monocular rgb video.Ad- vances in Neural Information Processing Systems, 37:5646– 5660, 2024. 3
work page 2024
-
[18]
Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang, Boyao Zhou, Boning Liu, Shengping Zhang, and Liqiang Nie. Gaussianavatar: Towards realistic human avatar modeling from a single video via animatable 3d gaussians. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 634–644, 2024
work page 2024
-
[19]
Gauhuman: Articu- lated gaussian splatting from monocular human videos
Shoukang Hu, Tao Hu, and Ziwei Liu. Gauhuman: Articu- lated gaussian splatting from monocular human videos. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 20418–20431, 2024. 3
work page 2024
-
[20]
Squeezeme: Mobile-ready distillation of gaussian full-body avatars
Forrest Iandola, Stanislav Pidhorskyi, Igor Santesteban, Divam Gupta, Anuj Pahuja, Nemanja Bartolovic, Frank Yu, Emanuel Garbin, Tomas Simon, and Shunsuke Saito. Squeezeme: Mobile-ready distillation of gaussian full-body avatars. InProceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers, pages...
work page 2025
-
[21]
Mustafa Is ¸ık, Martin R ¨unz, Markos Georgopoulos, Taras Khakhulin, Jonathan Starck, Lourdes Agapito, and Matthias Nießner. Humanrf: High-fidelity neural radiance fields for humans in motion.ACM Transactions on Graphics (TOG), 42(4):1–12, 2023. 3, 5
work page 2023
-
[22]
In- stantavatar: Learning avatars from monocular video in 60 seconds
Tianjian Jiang, Xu Chen, Jie Song, and Otmar Hilliges. In- stantavatar: Learning avatars from monocular video in 60 seconds. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16922– 16932, 2023. 3
work page 2023
-
[23]
Neuman: Neural human radiance field from a single video
Wei Jiang, Kwang Moo Yi, Golnoosh Samei, Oncel Tuzel, and Anurag Ranjan. Neuman: Neural human radiance field from a single video. InEuropean Conference on Computer Vision, pages 402–418. Springer, 2022. 3
work page 2022
-
[24]
Hifi4g: High-fidelity human performance rendering via compact gaussian splatting
Yuheng Jiang, Zhehao Shen, Penghao Wang, Zhuo Su, Yu Hong, Yingliang Zhang, Jingyi Yu, and Lan Xu. Hifi4g: High-fidelity human performance rendering via compact gaussian splatting. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 19734–19745, 2024. 2
work page 2024
-
[25]
Topology-aware optimization of gaussian primitives for human-centric volumetric videos
Yuheng Jiang, Chengcheng Guo, Yize Wu, Yu Hong, Shengkun Zhu, Zhehao Shen, Yingliang Zhang, Shaohui Jiao, Zhuo Su, Lan Xu, et al. Topology-aware optimization of gaussian primitives for human-centric volumetric videos. InProceedings of the SIGGRAPH Asia 2025 Conference Pa- pers, pages 1–12, 2025. 2
work page 2025
-
[26]
Learning controls for blend shape based realistic fa- cial animation
Pushkar Joshi, Wen C Tien, Mathieu Desbrun, and Fr ´ed´eric Pighin. Learning controls for blend shape based realistic fa- cial animation. InACM Siggraph 2006 Courses, pages 17– es. 2006. 3
work page 2006
-
[27]
Eva: Expressive vir- tual avatars from multi-view videos
Hendrik Junkawitsch, Guoxing Sun, Heming Zhu, Chris- tian Theobalt, and Marc Habermann. Eva: Expressive vir- tual avatars from multi-view videos. InProceedings of the Special Interest Group on Computer Graphics and Interac- tive Techniques Conference Conference Papers, pages 1–11,
-
[28]
3d gaussian splatting for real-time radiance field rendering.ACM Trans
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1,
-
[29]
Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, and Anurag Ranjan. Hugs: Human gaussian splats. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 505–515, 2024. 3
work page 2024
-
[30]
Youngjoong Kwon, Lingjie Liu, Henry Fuchs, Marc Haber- mann, and Christian Theobalt. Deliffas: Deformable light fields for fast avatar synthesis.Advances in Neural Informa- tion Processing Systems, 36, 2024. 3
work page 2024
-
[31]
Gen- eralizable human gaussians for sparse view synthesis
Youngjoong Kwon, Baole Fang, Yixing Lu, Haoye Dong, Cheng Zhang, Francisco Vicente Carrasco, Albert Mosella- Montoro, Jianjin Xu, Shingo Takagi, Daeil Kim, et al. Gen- eralizable human gaussians for sparse view synthesis. In European Conference on Computer Vision, pages 451–468. Springer, 2025. 2
work page 2025
-
[32]
Changmin Lee, Jihyun Lee, and Tae-Kyun Kim. Mpma- vatar: Learning 3d gaussian avatars with accurate and robust physics-based dynamics.arXiv preprint arXiv:2510.01619,
-
[33]
Gart: Gaussian articulated template mod- els
Jiahui Lei, Yufu Wang, Georgios Pavlakos, Lingjie Liu, and Kostas Daniilidis. Gart: Gaussian articulated template mod- els. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 19876–19887,
-
[34]
Rgbavatar: Reduced gaussian blendshapes for online modeling of head avatars
Linzhou Li, Yumeng Li, Yanlin Weng, Youyi Zheng, and Kun Zhou. Rgbavatar: Reduced gaussian blendshapes for online modeling of head avatars. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 10747–10757, 2025. 3
work page 2025
-
[35]
Tava: Template-free animatable volumetric actors
Ruilong Li, Julian Tanke, Minh V o, Michael Zollh ¨ofer, J¨urgen Gall, Angjoo Kanazawa, and Christoph Lassner. Tava: Template-free animatable volumetric actors. InEu- ropean Conference on Computer Vision, pages 419–436. Springer, 2022. 3
work page 2022
-
[36]
Learning a model of facial shape and expression from 4d scans.ACM Trans
Tianye Li, Timo Bolkart, Michael J Black, Hao Li, and Javier Romero. Learning a model of facial shape and expression from 4d scans.ACM Trans. Graph., 36(6):194–1, 2017. 5
work page 2017
-
[37]
Neural 3d video synthesis from multi-view video
Tianye Li, Mira Slavcheva, Michael Zollhoefer, Simon Green, Christoph Lassner, Changil Kim, Tanner Schmidt, Steven Lovegrove, Michael Goesele, Richard Newcombe, et al. Neural 3d video synthesis from multi-view video. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 5521–5531, 2022. 2
work page 2022
-
[38]
Posevocab: Learning joint-structured pose embeddings for human avatar modeling
Zhe Li, Zerong Zheng, Yuxiao Liu, Boyao Zhou, and Yebin Liu. Posevocab: Learning joint-structured pose embeddings for human avatar modeling. InACM SIGGRAPH 2023 Con- ference Proceedings, pages 1–11, 2023. 3
work page 2023
-
[39]
Zhe Li, Zerong Zheng, Lizhen Wang, and Yebin Liu. Ani- matable gaussians: Learning pose-dependent gaussian maps for high-fidelity human avatar modeling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19711–19722, 2024. 2, 3, 5, 6, 7, 13, 14
work page 2024
-
[40]
High-fidelity and real-time novel view synthesis for dynamic scenes
Haotong Lin, Sida Peng, Zhen Xu, Tao Xie, Xingyi He, Hu- jun Bao, and Xiaowei Zhou. High-fidelity and real-time novel view synthesis for dynamic scenes. InSIGGRAPH Asia 2023 Conference Papers, pages 1–9, 2023. 2
work page 2023
-
[41]
Creating your ed- itable 3d photorealistic avatar with tetrahedron-constrained gaussian splatting
Hanxi Liu, Yifang Men, and Zhouhui Lian. Creating your ed- itable 3d photorealistic avatar with tetrahedron-constrained gaussian splatting. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 15976–15986,
-
[42]
Lingjie Liu, Marc Habermann, Viktor Rudnev, Kripasindhu Sarkar, Jiatao Gu, and Christian Theobalt. Neural actor: Neural free-view synthesis of human actors with pose con- trol.ACM transactions on graphics (TOG), 40(6):1–16,
-
[43]
Texvo- cab: Texture vocabulary-conditioned human avatars
Yuxiao Liu, Zhe Li, Yebin Liu, and Haoqian Wang. Texvo- cab: Texture vocabulary-conditioned human avatars. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1715–1725, 2024. 3
work page 2024
-
[44]
Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. Smpl: a skinned multi- person linear model.ACM Transactions on Graphics (TOG), 34(6), 2015. 1, 3
work page 2015
-
[45]
Shugao Ma, Tomas Simon, Jason Saragih, Dawei Wang, Yuecheng Li, Fernando De La Torre, and Yaser Sheikh. Pixel codec avatars. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 64–73,
-
[46]
3d gaussian blendshapes for head avatar animation
Shengjie Ma, Yanlin Weng, Tianjia Shao, and Kun Zhou. 3d gaussian blendshapes for head avatar animation. InACM SIGGRAPH 2024 Conference Papers, pages 1–10, 2024. 3
work page 2024
-
[47]
Avatarwild: Fully controllable head avatars in the wild.Visual Informatics, 8 (3):96–106, 2024
Shaoxu Meng, Tong Wu, Fang-Lue Zhang, Shu-Yu Chen, Yuewen Ma, Wenbo Hu, and Lin Gao. Avatarwild: Fully controllable head avatars in the wild.Visual Informatics, 8 (3):96–106, 2024. 3
work page 2024
-
[48]
Nerf: Representing scenes as neural radiance fields for view syn- thesis
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InEuropean Conference on Computer Vision, pages 405–421. Springer, 2020. 2, 3
work page 2020
-
[49]
Expressive whole-body 3d gaussian avatar
Gyeongsik Moon, Takaaki Shiratori, and Shunsuke Saito. Expressive whole-body 3d gaussian avatar. InEuropean Conference on Computer Vision, pages 19–35. Springer,
-
[50]
Instant neural graphics primitives with a multires- olution hash encoding.ACM Trans
Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a multires- olution hash encoding.ACM Trans. Graph., 41(4):102:1– 102:15, 2022. 2
work page 2022
-
[51]
Local shape blending using coherent weighted regions.The Visual Computer, 27 (6):575–584, 2011
Kyung-Gun Na and Moon-Ryul Jung. Local shape blending using coherent weighted regions.The Visual Computer, 27 (6):575–584, 2011. 3
work page 2011
-
[52]
Sparse localized deformation components.ACM Transactions on Graphics (TOG), 32(6):1–10, 2013
Thomas Neumann, Kiran Varanasi, Stephan Wenger, Markus Wacker, Marcus Magnor, and Christian Theobalt. Sparse localized deformation components.ACM Transactions on Graphics (TOG), 32(6):1–10, 2013. 3
work page 2013
-
[53]
Compressed 3d gaussian splatting for accelerated novel view synthesis
Simon Niedermayr, Josef Stumpfegger, and R ¨udiger West- ermann. Compressed 3d gaussian splatting for accelerated novel view synthesis. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 10349–10358, 2024. 5
work page 2024
-
[54]
Nerfies: Deformable neural radiance fields
Keunhong Park, Utkarsh Sinha, Jonathan T Barron, Sofien Bouaziz, Dan B Goldman, Steven M Seitz, and Ricardo Martin-Brualla. Nerfies: Deformable neural radiance fields. InProceedings of the IEEE/CVF international conference on computer vision, pages 5865–5874, 2021. 2
work page 2021
-
[55]
Keunhong Park, Utkarsh Sinha, Peter Hedman, Jonathan T Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin- Brualla, and Steven M Seitz. Hypernerf: a higher- dimensional representation for topologically varying neural radiance fields.ACM Transactions on Graphics (TOG), 40 (6):1–12, 2021. 2
work page 2021
-
[56]
Expressive body capture: 3d hands, face, and body from a single image
Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed AA Osman, Dimitrios Tzionas, and Michael J Black. Expressive body capture: 3d hands, face, and body from a single image. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10975–10985, 2019. 5
work page 2019
-
[57]
Ani- matable neural radiance fields for modeling dynamic human bodies
Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Xiaowei Zhou, and Hujun Bao. Ani- matable neural radiance fields for modeling dynamic human bodies. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 14314–14323, 2021. 3
work page 2021
-
[58]
Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9054–9063, 2021. 3
work page 2021
-
[59]
Relightable gaussian codec avatars
Shunsuke Saito, Gabriel Schwartz, Tomas Simon, Junxuan Li, and Giljoo Nam. Relightable gaussian codec avatars. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 130–141, 2024. 3
work page 2024
-
[60]
Splattingavatar: Realistic real-time human avatars with mesh-embedded gaussian splatting
Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, and Zeyu Wang. Splattingavatar: Realistic real-time human avatars with mesh-embedded gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1606–1616, 2024. 3, 15
work page 2024
-
[61]
Degas: Detailed expressions on full- body gaussian avatars
Zhijing Shao, Duotun Wang, Qing-Yao Tian, Yao-Dong Yang, Hengyu Meng, Zeyu Cai, Bo Dong, Yu Zhang, Kang Zhang, and Zeyu Wang. Degas: Detailed expressions on full- body gaussian avatars. In2025 International Conference on 3D Vision (3DV), pages 1529–1540. IEEE, 2025. 3, 5
work page 2025
-
[62]
HRM 2Avatar: High-fidelity real-time mobile avatars from monocular phone scans
Chao Shi, Shenghao Jia, Jinhui Liu, Yong Zhang, Liangchao Zhu, Zhonglei Yang, Jinze Ma, Chaoyue Niu, and Chengfei Lv. HRM 2Avatar: High-fidelity real-time mobile avatars from monocular phone scans. InSIGGRAPH Asia 2025 Conference Papers (SA Conference Papers ’25), Hong Kong, Hong Kong, 2025. Association for Computing Machinery. 1, 3
work page 2025
-
[63]
Min Shi, Wenke Feng, Lin Gao, and Dengming Zhu. Gener- ating diverse clothed 3d human animations via a generative model.Computational Visual Media, 10(2):261–277, 2024. 2
work page 2024
-
[64]
Interactive region-based linear 3d face models.ACM Trans
J Rafael Tena, Fernando De la Torre, and Iain A Matthews. Interactive region-based linear 3d face models.ACM Trans. Graph., 30(4):76, 2011. 3
work page 2011
-
[65]
Jing Tong, Jin Zhou, Ligang Liu, Zhigeng Pan, and Hao Yan. Scanning 3d full human bodies using kinects.IEEE trans- actions on visualization and computer graphics, 18(4):643– 650, 2012. 2
work page 2012
-
[66]
Videorf: Ren- dering dynamic radiance fields as 2d feature video streams
Liao Wang, Kaixin Yao, Chengcheng Guo, Zhirui Zhang, Qiang Hu, Jingyi Yu, Lan Xu, and Minye Wu. Videorf: Ren- dering dynamic radiance fields as 2d feature video streams. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 470–481, 2024. 2
work page 2024
-
[67]
Arah: Animatable volume rendering of articulated hu- man sdfs
Shaofei Wang, Katja Schwarz, Andreas Geiger, and Siyu Tang. Arah: Animatable volume rendering of articulated hu- man sdfs. InEuropean conference on computer vision, pages 1–19. Springer, 2022. 3
work page 2022
-
[68]
Relightable full-body gaussian codec avatars
Shaofei Wang, Tomas Simon, Igor Santesteban, Timur Bagautdinov, Junxuan Li, Vasu Agrawal, Fabian Prada, Shoou-I Yu, Pace Nalbone, Matt Gramlich, et al. Relightable full-body gaussian codec avatars. InProceedings of the Special Interest Group on Computer Graphics and Interac- tive Techniques Conference Conference Papers, pages 1–12,
-
[69]
Freetimegs: Free gaussian primitives at anytime any- where for dynamic scene reconstruction
Yifan Wang, Peishan Yang, Zhen Xu, Jiaming Sun, Zhan- hua Zhang, Yong Chen, Hujun Bao, Sida Peng, and Xiaowei Zhou. Freetimegs: Free gaussian primitives at anytime any- where for dynamic scene reconstruction. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 21750–21760, 2025. 2
work page 2025
-
[70]
Gomavatar: Efficient an- imatable human modeling from monocular video using gaussians-on-mesh
Jing Wen, Xiaoming Zhao, Zhongzheng Ren, Alexander G Schwing, and Shenlong Wang. Gomavatar: Efficient an- imatable human modeling from monocular video using gaussians-on-mesh. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 2059–2069, 2024. 3
work page 2059
-
[71]
Hu- mannerf: Free-viewpoint rendering of moving people from monocular video
Chung-Yi Weng, Brian Curless, Pratul P Srinivasan, Jonathan T Barron, and Ira Kemelmacher-Shlizerman. Hu- mannerf: Free-viewpoint rendering of moving people from monocular video. InProceedings of the IEEE/CVF con- ference on computer vision and pattern Recognition, pages 16210–16220, 2022. 3
work page 2022
-
[72]
Chenglei Wu, Derek Bradley, Markus Gross, and Thabo Beeler. An anatomically-constrained local deformation model for monocular face capture.ACM transactions on graphics (TOG), 35(4):1–12, 2016. 3
work page 2016
-
[73]
Tetrirf: Temporal tri-plane radiance fields for efficient free-viewpoint video
Minye Wu, Zehao Wang, Georgios Kouros, and Tinne Tuyte- laars. Tetrirf: Temporal tri-plane radiance fields for efficient free-viewpoint video. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 6487–6496, 2024. 2
work page 2024
-
[74]
Recent advances in 3d gaussian splatting.Computational Visual Media, 10(4):613– 642, 2024
Tong Wu, Yu-Jie Yuan, Ling-Xiao Zhang, Jie Yang, Yan- Pei Cao, Ling-Qi Yan, and Lin Gao. Recent advances in 3d gaussian splatting.Computational Visual Media, 10(4):613– 642, 2024. 2
work page 2024
-
[75]
Se- quential gaussian avatars with hierarchical motion context
Wangze Xu, Yifan Zhan, Zhihang Zhong, and Xiao Sun. Se- quential gaussian avatars with hierarchical motion context. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 13592–13603, 2025. 3
work page 2025
-
[76]
4k4d: Real-time 4d view synthesis at 4k resolution
Zhen Xu, Sida Peng, Haotong Lin, Guangzhao He, Jiaming Sun, Yujun Shen, Hujun Bao, and Xiaowei Zhou. 4k4d: Real-time 4d view synthesis at 4k resolution. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20029–20040, 2024. 2
work page 2024
-
[77]
Zhen Xu, Yinghao Xu, Zhiyuan Yu, Sida Peng, Jiaming Sun, Hujun Bao, and Xiaowei Zhou. Representing long volumet- ric video with temporal gaussian hierarchy.ACM Transac- tions on Graphics (TOG), 43(6):1–18, 2024. 2
work page 2024
-
[78]
Vickie Ye, Ruilong Li, Justin Kerr, Matias Turkulainen, Brent Yi, Zhuoyang Pan, Otto Seiskari, Jianbo Ye, Jeffrey Hu, Matthew Tancik, and Angjoo Kanazawa. gsplat: An open-source library for gaussian splatting.Journal of Ma- chine Learning Research, 26(34):1–17, 2025. 4
work page 2025
-
[79]
Bowei Yin, Junke Zhu, and Zhangjin Huang. Cyclegaus- sianavatar: Encoding of facial details with the cycle consis- tency framework.Visual Informatics, page 100264, 2025. 3
work page 2025
-
[80]
Monohuman: Animatable human neu- ral field from monocular video
Zhengming Yu, Wei Cheng, Xian Liu, Wayne Wu, and Kwan-Yee Lin. Monohuman: Animatable human neu- ral field from monocular video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16943–16953, 2023. 3
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.