Bringing Your Portrait to 3D Presence
Pith reviewed 2026-05-17 04:33 UTC · model grok-4.3
The pith
A unified framework turns a single portrait into an animatable 3D human avatar across head, half-body, and full-body scales.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By introducing Dual-UV representation mapping image features to canonical UV space through Core-UV and Shell-UV branches to remove pose and framing effects, building a factorized synthetic data manifold that merges 2D generative diversity with 3D-consistent renderings along with a supporting training scheme for better realism and identity consistency, and employing a robust proxy-mesh tracker for stability under partial visibility, the framework achieves strong in-the-wild generalization. When trained exclusively on half-body synthetic data, the model attains state-of-the-art results for head and upper-body reconstruction while remaining competitive for full-body cases.
What carries the argument
Dual-UV representation with Core-UV and Shell-UV branches that map image features to a canonical UV space to eliminate pose- and framing-induced shifts.
If this is right
- Reconstruction becomes possible from single images rather than requiring multiple views or videos.
- The model generalizes from synthetic half-body training to real-world full-body portraits.
- Animatable avatars can be produced at different body scales with one unified approach.
- Proxy mesh estimation remains stable even with incomplete visibility in the input.
- Reliance on real 3D scanned data for training is reduced through the synthetic manifold.
Where Pith is reading between the lines
- If the Dual-UV mapping proves robust, it could be adapted for reconstructing other dynamic objects like animals or clothing from single views.
- The factorized data approach might enable easy scaling to new identities by swapping in different generative models without retraining the full system.
- Competitive full-body performance suggests potential for extension to complete body animation including legs and hands with minimal additional data.
- Strong in-the-wild results imply applications in mobile apps for quick avatar creation from selfies.
Load-bearing premise
The factorized synthetic data manifold combined with the described training scheme provides enough realism and identity consistency to support strong in-the-wild generalization despite training exclusively on half-body synthetic data.
What would settle it
Running the model on a diverse set of real in-the-wild portraits with unusual poses, framings, or demographics and measuring reconstruction quality against ground-truth 3D models would falsify the generalization if errors exceed those on synthetic tests.
Figures
read the original abstract
We present a unified framework for reconstructing animatable 3D human avatars from a single portrait across head, half-body, and full-body inputs. Our method tackles three bottlenecks: pose- and framing-sensitive feature representations, limited scalable data, and unreliable proxy-mesh estimation. We introduce a Dual-UV representation that maps image features to a canonical UV space via Core-UV and Shell-UV branches, eliminating pose- and framing-induced token shifts. We also build a factorized synthetic data manifold combining 2D generative diversity with geometry-consistent 3D renderings, supported by a training scheme that improves realism and identity consistency. A robust proxy-mesh tracker maintains stability under partial visibility. Together, these components enable strong in-the-wild generalization. Trained only on half-body synthetic data, our model achieves state-of-the-art head and upper-body reconstruction and competitive full-body results. Extensive experiments and analyses further validate the effectiveness of our approach.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a unified framework for reconstructing animatable 3D human avatars from a single portrait image, applicable to head, half-body, and full-body inputs. It introduces a Dual-UV representation with Core-UV and Shell-UV branches to map features to canonical space, a factorized synthetic data manifold combining 2D generative diversity with 3D-consistent renderings, and a robust proxy-mesh tracker for stability under partial visibility. The central claim is that training exclusively on half-body synthetic data enables state-of-the-art head and upper-body reconstruction, competitive full-body results, and strong in-the-wild generalization.
Significance. If the generalization claims hold with supporting evidence, the work could meaningfully advance single-image 3D avatar reconstruction by mitigating data scarcity and proxy estimation issues through synthetic factorization and architectural innovations. The Dual-UV approach and training scheme offer a potentially reusable strategy for handling pose/framing variations. However, the overall significance is limited by the absence of direct quantitative validation for the synthetic-to-real transfer on full-body cases.
major comments (3)
- [Abstract and §5] Abstract and §5 (Experiments): The claim that the model 'achieves state-of-the-art head and upper-body reconstruction and competitive full-body results' when trained only on half-body synthetic data is not accompanied by any quantitative metrics, ablation studies, error bars, or baseline comparisons. Without these in the experiments, it is impossible to determine whether the data support the stated performance claims or to attribute gains to the Dual-UV branches versus the data manifold.
- [§4.3] §4.3 (Data manifold and training scheme): The central generalization claim—that the factorized synthetic data manifold plus Core-UV/Shell-UV training produces sufficient realism and identity consistency for in-the-wild full-body inputs—rests on an untested transfer. Half-body data inherently lacks lower-body pose/occlusion statistics, and no ablation isolates the manifold's contribution on real full-body test images; if this transfer fails, the SOTA and competitive results cannot be credited to the proposed components.
- [§4.4] §4.4 (Proxy-mesh tracker): The robust proxy-mesh tracker is presented as solving unreliable estimation under partial visibility, yet no quantitative evaluation (e.g., stability metrics or failure rates versus baselines on occluded full-body cases) is reported. This component is load-bearing for the full-body results but lacks the evidence needed to confirm its contribution.
minor comments (2)
- [Figure 2] Figure 2: The Dual-UV visualization would benefit from explicit arrows or labels clarifying how image features are mapped through the Core-UV and Shell-UV branches to the canonical space.
- [§3.2] §3.2: The notation for the factorized synthetic data manifold could be formalized with an equation defining the combination of 2D generative diversity and geometry-consistent 3D renderings.
Simulated Author's Rebuttal
We sincerely thank the referee for the constructive and detailed feedback. The comments highlight important opportunities to strengthen the quantitative support for our claims. We address each major comment point by point below and outline the revisions we will make.
read point-by-point responses
-
Referee: [Abstract and §5] The claim that the model 'achieves state-of-the-art head and upper-body reconstruction and competitive full-body results' when trained only on half-body synthetic data is not accompanied by any quantitative metrics, ablation studies, error bars, or baseline comparisons. Without these in the experiments, it is impossible to determine whether the data support the stated performance claims or to attribute gains to the Dual-UV branches versus the data manifold.
Authors: We appreciate this observation. Our current experiments emphasize qualitative visual comparisons and in-the-wild generalization results, which we believe demonstrate the effectiveness of the approach. To provide more rigorous validation, we will add quantitative metrics (e.g., PSNR, SSIM, LPIPS) on synthetic test sets, baseline comparisons, and ablations isolating the Dual-UV and data manifold contributions, including error bars from repeated runs. These will be incorporated into the revised manuscript. revision: yes
-
Referee: [§4.3] The central generalization claim—that the factorized synthetic data manifold plus Core-UV/Shell-UV training produces sufficient realism and identity consistency for in-the-wild full-body inputs—rests on an untested transfer. Half-body data inherently lacks lower-body pose/occlusion statistics, and no ablation isolates the manifold's contribution on real full-body test images; if this transfer fails, the SOTA and competitive results cannot be credited to the proposed components.
Authors: The factorized data manifold combines 2D generative diversity with 3D-consistent renderings precisely to support generalization beyond the half-body training distribution, with the Dual-UV representation further mitigating pose and framing variations. We agree that an explicit ablation on real full-body inputs would strengthen attribution of the results. In the revision we will add such an ablation evaluating the manifold's isolated contribution on real full-body test cases. revision: yes
-
Referee: [§4.4] The robust proxy-mesh tracker is presented as solving unreliable estimation under partial visibility, yet no quantitative evaluation (e.g., stability metrics or failure rates versus baselines on occluded full-body cases) is reported. This component is load-bearing for the full-body results but lacks the evidence needed to confirm its contribution.
Authors: We acknowledge that quantitative evidence for the proxy-mesh tracker's robustness would better substantiate its role. We will add stability metrics (e.g., average vertex displacement and failure rates under occlusion) and comparisons against baseline trackers on occluded full-body cases in the experiments section of the revised manuscript. revision: yes
Circularity Check
Novel components and data scheme presented without self-referential reductions or fitted predictions
full rationale
The paper introduces Dual-UV representation (Core-UV and Shell-UV branches), a factorized synthetic data manifold, and a robust proxy-mesh tracker as new elements to address pose/framing issues, data scalability, and proxy estimation. These are described as enabling strong in-the-wild generalization from half-body synthetic training data to head/upper-body SOTA and competitive full-body results. No equations, predictions, or central claims reduce by construction to fitted parameters, self-definitions, or self-citation chains. Extensive experiments are cited as independent validation, making the derivation self-contained against external benchmarks with only minor self-citation risk at most.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A factorized synthetic data manifold can combine 2D generative diversity with geometry-consistent 3D renderings to improve realism and identity consistency.
invented entities (2)
-
Dual-UV representation
no independent evidence
-
robust proxy-mesh tracker
no independent evidence
Forward citations
Cited by 1 Pith paper
-
UIKA: Fast Universal Head Avatar from Pose-Free Images
UIKA is a feed-forward animatable Gaussian head model using UV-guided correspondence estimation and learnable UV tokens with dual-level attention, trained on large-scale synthetic data to handle pose-free inputs.
Reference graph
Works this paper leans on
-
[1]
Gaussian shell maps for efficient 3d hu- man generation
Rameen Abdal, Wang Yifan, Zifan Shi, Yinghao Xu, Ryan Po, Zhengfei Kuang, Qifeng Chen, Dit-Yan Yeung, and Gordon Wetzstein. Gaussian shell maps for efficient 3d hu- man generation. InCVPR, 2024. 3
work page 2024
-
[2]
Sizhe An, Hongyi Xu, Yichun Shi, Guoxian Song, Umit Y . Ogras, and Linjie Luo. Panohead: Geometry-aware 3d full- head synthesis in 360deg. InCVPR, pages 20950–20959,
-
[3]
Multi-hmr: Multi-person whole-body hu- man mesh recovery in a single shot
Fabien Baradel*, Matthieu Armando, Salma Galaaoui, Ro- main Br ´egier, Philippe Weinzaepfel, Gr ´egory Rogez, and Thomas Lucas*. Multi-hmr: Multi-person whole-body hu- man mesh recovery in a single shot. InECCV, 2024. 5
work page 2024
-
[4]
Jonathan T. Barron. A general and adaptive robust loss function, 2019. 7
work page 2019
-
[5]
A morphable model for the synthesis of 3d faces
V olker Blanz and Thomas Vetter. A morphable model for the synthesis of 3d faces. InACM TOG, page 187–194, USA, 1999. ACM Press/Addison-Wesley Publishing Co. 3
work page 1999
-
[6]
B ¨uhler, Ye Yuan, Xueting Li, Yangyi Huang, Koki Nagano, and Umar Iqbal
Marcel C. B ¨uhler, Ye Yuan, Xueting Li, Yangyi Huang, Koki Nagano, and Umar Iqbal. Dream, lift, animate: From single images to animatable gaussian avatars, 2025. 3
work page 2025
-
[7]
Hera: Hybrid explicit representation for ultra-realistic head avatars
Hongrui Cai, Yuting Xiao, Xuan Wang, Jiafei Li, Yudong Guo, Yanbo Fan, Shenghua Gao, and Juyong Zhang. Hera: Hybrid explicit representation for ultra-realistic head avatars. InCVPR, 2025. 3
work page 2025
-
[8]
Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. Facewarehouse: A 3d facial expression database for visual computing.IEEE Transactions on Visualization and Computer Graphics, 20(3):413–425, 2014. 2
work page 2014
-
[9]
Real-time facial animation with image-based dynamic avatars.ACM TOG, 35(4), 2016
Chen Cao, Hongzhi Wu, Yanlin Weng, Tianjia Shao, and Kun Zhou. Real-time facial animation with image-based dynamic avatars.ACM TOG, 35(4), 2016. 3
work page 2016
-
[10]
pi-gan: Periodic implicit genera- tive adversarial networks for 3d-aware image synthesis
Eric Chan, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein. pi-gan: Periodic implicit genera- tive adversarial networks for 3d-aware image synthesis. In CVPR, 2021. 3
work page 2021
-
[11]
Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas Guibas, Jonathan Tremblay, Sameh Khamis, Tero Karras, and Gordon Wetzstein. Efficient geometry-aware 3D generative adversarial networks. InCVPR, 2022. 3
work page 2022
-
[12]
Magicpose: Realistic human poses and facial expressions retargeting with identity-aware diffu- sion
Di Chang, Yichun Shi, Quankai Gao, Hongyi Xu, Jessica Fu, Guoxian Song, Qing Yan, Yizhe Zhu, Xiao Yang, and Mohammad Soleymani. Magicpose: Realistic human poses and facial expressions retargeting with identity-aware diffu- sion. InICML, pages 6263–6285, 2024. 3
work page 2024
-
[13]
Jianchuan Chen, Jingchuan Hu, Gaige Wang, Zhonghua Jiang, Tiansong Zhou, Zhiwen Chen, and Chengfei Lv. Taoavatar: Real-time lifelike full-body talking avatars for augmented reality via 3d gaussian splatting. InCVPR, pages 10723–10734, 2025. 2
work page 2025
-
[14]
Synchuman: Synchronizing 2d and 3d diffusion models for single-view human reconstruction
Wenyue Chen, Peng Li, Wangguandong Zheng, Chengfeng Zhao, Mengfei Li, Yaolong Zhu, Zhiyang Dou, Ronggang Wang, and Yuan Liu. Synchuman: Synchronizing 2d and 3d diffusion models for single-view human reconstruction. InNeurIPS, 2025. 3
work page 2025
-
[15]
Dna-rendering: A diverse neural actor repository for high-fidelity human-centric rendering
Wei Cheng, Ruixiang Chen, Siming Fan, Wanqi Yin, Keyu Chen, Zhongang Cai, Jingbo Wang, Yang Gao, Zheng- ming Yu, Zhengyu Lin, Daxuan Ren, Lei Yang, Ziwei Liu, Chen Change Loy, Chen Qian, Wayne Wu, Dahua Lin, Bo Dai, and Kwan-Yee Lin. Dna-rendering: A diverse neural actor repository for high-fidelity human-centric rendering. InICCV, pages 19982–19993, 2023. 2
work page 2023
-
[16]
Generalizable and an- imatable gaussian head avatar
Xuangeng Chu and Tatsuya Harada. Generalizable and an- imatable gaussian head avatar. InThe Thirty-eighth An- nual Conference on Neural Information Processing Sys- tems, 2024. 7
work page 2024
-
[17]
The light stages and their applications to photoreal digital actors.ACM TOG, 2(4):1–6, 2012
Paul Debevec. The light stages and their applications to photoreal digital actors.ACM TOG, 2(4):1–6, 2012. 3
work page 2012
-
[18]
Black, Ot- mar Hilliges, and Andreas Geiger
Zijian Dong, Xu Chen, Jinlong Yang, Michael J. Black, Ot- mar Hilliges, and Andreas Geiger. AG3D: Learning to gen- erate 3D avatars from 2D image collections. InICCV, 2023. 3
work page 2023
-
[19]
Tam- ing transformers for high-resolution image synthesis
Patrick Esser, Robin Rombach, and Bj ¨orn Ommer. Tam- ing transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 12873–12883,
-
[20]
Yao Feng, Vasileios Choutas, Timo Bolkart, Dimitrios Tzionas, and Michael J. Black. Collaborative regression of expressive bodies using moderation. InInternational Con- ference on 3D Vision (3DV), 2021. 5
work page 2021
-
[21]
Stylegan-human: A data-centric odyssey of human genera- tion
Jianglin Fu, Shikai Li, Yuming Jiang, Kwan-Yee Lin, Chen Qian, Chen-Change Loy, Wayne Wu, and Ziwei Liu. Stylegan-human: A data-centric odyssey of human genera- tion. InECCV, pages 729–747, 2022. 3, 6
work page 2022
-
[22]
Portrait video editing em- powered by multimodal generative priors
Xuan Gao, Haiyao Xiao, Chenglai Zhong, Shimin Hu, Yudong Guo, and Juyong Zhang. Portrait video editing em- powered by multimodal generative priors. InSIGGRAPH Asia Conference Proceedings, 2024. 3
work page 2024
-
[23]
Controlling avatar diffusion with learnable gaussian embedding
Xuan Gao, Jingtao Zhou, Dongyu Liu, Yuqi Zhou, and Juyong Zhang. Controlling avatar diffusion with learnable gaussian embedding. InProceedings of SIGGRAPH Asia 2025, 2025. 3, 5
work page 2025
-
[24]
Talk-act: Enhance textural-awareness for 2d speaking avatar reenactment with diffusion model
Jiazhi Guan, Quanwei Yang, Kaisiyuan Wang, Hang Zhou, Shengyi He, Zhiliang Xu, Haocheng Feng, Errui Ding, Jingdong Wang, Hongtao Xie, Youjian Zhao, and Ziwei Liu. Talk-act: Enhance textural-awareness for 2d speaking avatar reenactment with diffusion model. InSIGGRAPH Asia 2024 Conference Papers, 2024. 3
work page 2024
-
[25]
Vid2avatar: 3d avatar reconstruction from videos in the wild via self-supervised scene decomposition
Chen Guo, Tianjian Jiang, Xu Chen, Jie Song, and Otmar Hilliges. Vid2avatar: 3d avatar reconstruction from videos in the wild via self-supervised scene decomposition. In CVPR, 2023. 3 9
work page 2023
-
[26]
Sega: Drivable 3d gaussian head avatar from a single im- age, 2025
Chen Guo, Zhuo Su, Jian Wang, Shuang Li, Xu Chang, Zhaohu Li, Yang Zhao, Guidong Wang, and Ruqi Huang. Sega: Drivable 3d gaussian head avatar from a single im- age, 2025. 3
work page 2025
-
[27]
High-fidelity 3d hu- man digitization from single 2k resolution images
Sang-Hun Han, Min-Gyu Park, Ju Hong Yoon, Ju-Mi Kang, Young-Jae Park, and Hae-Gon Jeon. High-fidelity 3d hu- man digitization from single 2k resolution images. In CVPR, 2023. 2
work page 2023
-
[28]
Lam: Large avatar model for one-shot animatable gaussian head
Yisheng He, Xiaodong Gu, Xiaodan Ye, Chao Xu, Zhengyi Zhao, Yuan Dong, Weihao Yuan, Zilong Dong, and Liefeng Bo. Lam: Large avatar model for one-shot animatable gaussian head. InProceedings of SIGGRAPH, pages 1–13,
-
[29]
Look ma, no markers: holistic per- formance capture without the hassle.ACM TOG, 43(6),
Charlie Hewitt, Fatemeh Saleh, Sadegh Aliakbarian, Lohit Petikam, Shideh Rezaeifar, Louis Florentin, Zafiirah Ho- senie, Thomas J Cashman, Julien Valentin, Darren Cosker, and Tadas Baltruˇsaitis. Look ma, no markers: holistic per- formance capture without the hassle.ACM TOG, 43(6),
-
[30]
Eva3d: Compositional 3d human generation from 2d image collections.ICLR, 2022
Fangzhou Hong, Zhaoxi Chen, Yushi Lan, Liang Pan, and Ziwei Liu. Eva3d: Compositional 3d human generation from 2d image collections.ICLR, 2022. 3
work page 2022
-
[31]
Headnerf: A real-time nerf-based parametric head model
Yang Hong, Bo Peng, Haiyao Xiao, Ligang Liu, and Juyong Zhang. Headnerf: A real-time nerf-based parametric head model. InCVPR, 2022. 3
work page 2022
-
[32]
Lrm: Large reconstruction model for single image to 3d
Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. Lrm: Large reconstruction model for single image to 3d. InICLR, 2024. 1, 3, 4
work page 2024
-
[33]
Adahuman: Animatable detailed 3d human genera- tion with compositional multiview diffusion
Yangyi Huang, Ye Yuan, Xueting Li, Jan Kautz, and Umar Iqbal. Adahuman: Animatable detailed 3d human genera- tion with compositional multiview diffusion. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision (ICCV), pages 13533–13543, 2025. 3
work page 2025
-
[34]
Humanrf: High-fidelity neural radiance fields for humans in motion.ACM TOG, 42(4):1–12, 2023
Mustafa Is ¸ık, Martin R¨unz, Markos Georgopoulos, Taras Khakhulin, Jonathan Starck, Lourdes Agapito, and Matthias Nießner. Humanrf: High-fidelity neural radiance fields for humans in motion.ACM TOG, 42(4):1–12, 2023. 2, 5
work page 2023
-
[35]
Learning high fi- delity depths of dressed humans by watching social media dance videos
Yasamin Jafarian and Hyun Soo Park. Learning high fi- delity depths of dressed humans by watching social media dance videos. InCVPR, pages 12753–12762, 2021. 3
work page 2021
-
[36]
Yudong Jin, Sida Peng, Xuan Wang, Tao Xie, Zhen Xu, Yi- fan Yang, Yujun Shen, Hujun Bao, and Xiaowei Zhou. Dif- fuman4d: 4d consistent human view synthesis from sparse- view videos with spatio-temporal diffusion models. In ICCV, 2025. 3
work page 2025
-
[37]
Pippo: High-resolution multi-view humans from a single image
Yash Kant, Ethan Weber, Jin Kyu Kim, Rawal Khirodkar, Su Zhaoen, Julieta Martinez, Igor Gilitschenski, Shunsuke Saito, and Timur Bagautdinov. Pippo: High-resolution multi-view humans from a single image. InCVPR, 2025. 3, 5
work page 2025
-
[38]
A style-based generator architecture for generative adversarial networks
Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intel- ligence, 43(12):4217–4228, 2021. 3
work page 2021
-
[39]
3d gaussian splatting for real-time radiance field rendering.ACM TOG, 42(4), 2023
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM TOG, 42(4), 2023. 3
work page 2023
-
[40]
Sapiens: Foundation for human vision models
Rawal Khirodkar, Timur Bagautdinov, Julieta Martinez, Zhaoen Su, Austin James, Peter Selednik, Stuart Anderson, and Shunsuke Saito. Sapiens: Foundation for human vision models. InECCV, 2024. 4, 5
work page 2024
-
[41]
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017. 6, 7
work page 2017
-
[42]
Nersemble: Multi-view radi- ance field reconstruction of human heads.ACM TOG, 2023
Tobias Kirschstein, Shenhan Qian, Simon Giebenhain, Tim Walter, and Matthias Nießner. Nersemble: Multi-view radi- ance field reconstruction of human heads.ACM TOG, 2023. 2, 6
work page 2023
-
[43]
GGHead: Fast and Generalizable 3D Gaussian Heads
Tobias Kirschstein, Simon Giebenhain, Jiapeng Tang, Markos Georgopoulos, and Matthias Nießner. GGHead: Fast and Generalizable 3D Gaussian Heads. InSIGGRAPH Asia Conference Papers, 2024. 3
work page 2024
-
[44]
Dreamhuman: Animatable 3d avatars from text.NeurIPS, 36:10516–10529, 2023
Nikos Kolotouros, Thiemo Alldieck, Andrei Zanfir, Ed- uard Bazavan, Mihai Fieraru, and Cristian Sminchisescu. Dreamhuman: Animatable 3d avatars from text.NeurIPS, 36:10516–10529, 2023. 3
work page 2023
-
[45]
Jason Lawrence, Danb Goldman, Supreeth Achar, Gre- gory Major Blascovich, Joseph G. Desloge, Tommy Fortes, Eric M. Gomez, Sascha H ¨aberling, Hugues Hoppe, Andy Huibers, Claude Knaus, Brian Kuschak, Ricardo Martin- Brualla, Harris Nover, Andrew Ian Russell, Steven M. Seitz, and Kevin Tong. Project starline: a high-fidelity telepresence system.ACM TOG, 40(...
work page 2021
-
[46]
Spherehead: Stable 3d full-head synthesis with spherical tri-plane representa- tion
Heyuan Li, Ce Chen, Tianhao Shi, Yuda Qiu, Sizhe An, Guanying Chen, and Xiaoguang Han. Spherehead: Stable 3d full-head synthesis with spherical tri-plane representa- tion. InECCV, 2024. 3
work page 2024
-
[47]
Hyplanehead: Rethinking tri-plane-like representations in full-head image synthesis
Heyuan Li, Kenkun Liu, Lingteng Qiu, Qi Zuo, Keru Zheng, Zilong Dong, and Xiaoguang Han. Hyplanehead: Rethinking tri-plane-like representations in full-head image synthesis. InNeurIPS, 2025. Poster. 3
work page 2025
-
[48]
Openhumanvid: A large-scale high-quality dataset for enhancing human-centric video generation
Hui Li, Mingwang Xu, Yun Zhan, Shan Mu, Jiaye Li, Kai- hui Cheng, Yuxuan Chen, Tan Chen, Mao Ye, Jingdong Wang, et al. Openhumanvid: A large-scale high-quality dataset for enhancing human-centric video generation. In CVPR, 2025. 3, 6
work page 2025
-
[49]
Uravatar: Universal relightable gaussian codec avatars
Junxuan Li, Chen Cao, Gabriel Schwartz, Rawal Khirod- kar, Christian Richardt, Tomas Simon, Yaser Sheikh, and Shunsuke Saito. Uravatar: Universal relightable gaussian codec avatars. InSIGGRAPH Conference Papers, 2024. 3
work page 2024
-
[50]
Pshuman: Photorealistic single-view human reconstruction using cross-scale diffusion
Peng Li, Wangguandong Zheng, Yuan Liu, Tao Yu, Yang- guang Li, Xingqun Qi, Mengfei Li, Xiaowei Chi, Siyu Xia, Wei Xue, et al. Pshuman: Photorealistic single-view human reconstruction using cross-scale diffusion. InCVPR, 2025. 3
work page 2025
-
[51]
Tianye Li, Timo Bolkart, Michael. J. Black, Hao Li, and Javier Romero. Learning a model of facial shape and ex- pression from 4D scans.ACM TOG, 36(6):194:1–194:17,
-
[52]
Animatable gaussians: Learning pose-dependent gaussian maps for high-fidelity human avatar modeling
Zhe Li, Zerong Zheng, Lizhen Wang, and Yebin Liu. Animatable gaussians: Learning pose-dependent gaussian maps for high-fidelity human avatar modeling. InProceed- 10 ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19711–19722, 2024. 3
work page 2024
-
[53]
Cyberhost: A one-stage diffusion framework for audio-driven talking body generation
Gaojie Lin, Jianwen Jiang, Chao Liang, Tianyun Zhong, Jiaqi Yang, Zerong Zheng, and Yanbo Zheng. Cyberhost: A one-stage diffusion framework for audio-driven talking body generation. InICLR, 2025. 3
work page 2025
-
[54]
One-stage 3d whole-body mesh recovery with com- ponent aware transformer
Jing Lin, Ailing Zeng, Haoqian Wang, Lei Zhang, and Yu Li. One-stage 3d whole-body mesh recovery with com- ponent aware transformer. InCVPR, pages 21159–21168,
-
[55]
Haiyang Liu, Xingchao Yang, Tomoya Akiyama, Yuantian Huang, Qiaoge Li, Shigeru Kuriyama, and Takafumi Take- tomi. Tango: Co-speech gesture video reenactment with hierarchical audio motion embedding and diffusion inter- polation. InICLR, 2025. 3
work page 2025
-
[56]
Humangaus- sian: Text-driven 3d human generation with gaussian splat- ting
Xian Liu, Xiaohang Zhan, Jiaxiang Tang, Ying Shan, Gang Zeng, Dahua Lin, Xihui Liu, and Ziwei Liu. Humangaus- sian: Text-driven 3d human generation with gaussian splat- ting. InCVPR, 2024. 3
work page 2024
-
[57]
Gas: Generative avatar synthesis from a single image
Yixing Lu, Junting Dong, Youngjoong Kwon, Qin Zhao, Bo Dai, and Fernando De la Torre. Gas: Generative avatar synthesis from a single image. InICCV, 2025. 3
work page 2025
-
[58]
Julieta Martinez, Emily Kim, Javier Romero, et al. Codec Avatar Studio: Paired Human Captures for Complete, Driveable, and Generalizable Avatars.NeurIPS, 2024. 2, 5
work page 2024
-
[59]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InECCV, 2020. 3
work page 2020
-
[60]
Expressive whole-body 3D gaussian avatar
Gyeongsik Moon, Takaaki Shiratori, and Shunsuke Saito. Expressive whole-body 3D gaussian avatar. InECCV,
-
[61]
Wright.Numerical Optimiza- tion
Jorge Nocedal and Stephen J. Wright.Numerical Optimiza- tion. Springer, New York, NY , USA, second edition, 2006. 2
work page 2006
- [62]
-
[63]
Maxime Oquab, Timoth ´ee Darcet, Theo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernan- dez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Russell Howes, Po-Yao Huang, Hu Xu, Vasu Sharma, Shang-Wen Li, Wojciech Galuba, Mike Rabbat, Mido Ass- ran, Nicolas Ballas, Gabriel Synnaeve, Ishan Misra, Herve Jegou, Julien Mairal, Patric...
work page 2023
-
[64]
Renderme-360: Large digital asset library and benchmark towards high-fidelity head avatars
Dongwei Pan, Long Zhuo, Jingtan Piao, Huiwen Luo, Wei Cheng, Yuxin Wang, Siming Fan, Shengqi Liu, Lei Yang, Bo Dai, Ziwei Liu, Chen Change Loy, Chen Qian, Wayne Wu, Dahua Lin, and Kwan-Yee Lin. Renderme-360: Large digital asset library and benchmark towards high-fidelity head avatars. InThirty-seventh Conference on Neural In- formation Processing Systems ...
work page 2023
-
[65]
Humansplat: Generalizable single-image human gaus- sian splatting with structure priors
Panwang Pan, Zhuo Su, Chenguo Lin, Zhen Fan, Yongjie Zhang, Zeming Li, Tingting Shen, Yadong Mu, and Yebin Liu. Humansplat: Generalizable single-image human gaus- sian splatting with structure priors. InNeurIPS, 2024. 3
work page 2024
-
[66]
Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. Expressive body capture: 3D hands, face, and body from a single image. InCVPR, pages 10975– 10985, 2019. 1
work page 2019
-
[67]
Re- constructing hands in 3D with transformers
Georgios Pavlakos, Dandan Shan, Ilija Radosavovic, Angjoo Kanazawa, David Fouhey, and Jitendra Malik. Re- constructing hands in 3D with transformers. InCVPR,
-
[68]
Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. InCVPR,
-
[69]
Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion. InICLR,
-
[70]
Gaussianavatars: Photorealistic head avatars with rigged 3d gaussians
Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Da- vide Davoli, Simon Giebenhain, and Matthias Nießner. Gaussianavatars: Photorealistic head avatars with rigged 3d gaussians. InCVPR, 2023. 3
work page 2023
-
[71]
Lhm: Large animat- able human reconstruction model from a single image in seconds
Lingteng Qiu, Xiaodong Gu, Peihao Li, Qi Zuo, Weichao Shen, Junfei Zhang, Kejie Qiu, Weihao Yuan, Guanying Chen, Zilong Dong, and Liefeng Bo. Lhm: Large animat- able human reconstruction model from a single image in seconds. InICCV, 2025. 1, 3, 4, 5
work page 2025
-
[72]
Pf-lhm: 3d animatable avatar reconstruction from pose-free articulated human images,
Lingteng Qiu, Peihao Li, Qi Zuo, Xiaodong Gu, Yuan Dong, Weihao Yuan, Siyu Zhu, Xiaoguang Han, Guany- ing Chen, and Zilong Dong. Pf-lhm: 3d animatable avatar reconstruction from pose-free articulated human images,
-
[73]
Anigs: Animatable gaussian avatar from a single image with inconsistent gaussian reconstruction
Lingteng Qiu, Shenhao Zhu, Qi Zuo, Xiaodong Gu, Yuan Dong, Junfei Zhang, Chao Xu, Zhe Li, Weihao Yuan, Liefeng Bo, et al. Anigs: Animatable gaussian avatar from a single image with inconsistent gaussian reconstruction. In CVPR, 2025. 3
work page 2025
-
[74]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, pages 10684–10695, 2022. 3
work page 2022
-
[75]
Pifu: Pixel-aligned implicit function for high-resolution clothed human digiti- zation
Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Mor- ishima, Angjoo Kanazawa, and Hao Li. Pifu: Pixel-aligned implicit function for high-resolution clothed human digiti- zation. InICCV, pages 2304–2313, 2019. 3
work page 2019
-
[76]
Relightable gaussian codec avatars
Shunsuke Saito, Gabriel Schwartz, Tomas Simon, Junxuan Li, and Giljoo Nam. Relightable gaussian codec avatars. In CVPR, 2024. 3
work page 2024
-
[77]
Dreamgaussian: Generative gaussian splatting for efficient 3d content creation
Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, and Gang Zeng. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. InICLR, 2024. 3
work page 2024
- [78]
-
[79]
Qwen-image technical report, 2025
Qwen-Image Team. Qwen-image technical report, 2025. 5, 4
work page 2025
-
[80]
Wan: Open and advanced large-scale video generative models, 2025
Wan Team. Wan: Open and advanced large-scale video generative models, 2025. 5, 4
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.