UIKA: Fast Universal Head Avatar from Pose-Free Images
Pith reviewed 2026-05-22 11:57 UTC · model grok-4.3
The pith
UIKA creates animatable Gaussian head avatars from any number of pose-free images via a single forward pass.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
UIKA is a feed-forward animatable Gaussian head model that processes any number of pose-free images by associating each with pixel-wise facial correspondence estimation. This allows reprojecting valid pixel colors from screen space to UV space independent of camera pose and expression. Learnable UV tokens enable attention at screen and UV levels to aggregate information, which are decoded into canonical Gaussian attributes. A large-scale identity-rich synthetic dataset supports training the large avatar model.
What carries the argument
The UV-guided avatar modeling strategy, where pixel-wise facial correspondence enables reprojection to pose-independent UV space, combined with learnable UV tokens for attention-based aggregation across inputs.
If this is right
- Supports creation of avatars from a single image or smartphone videos without requiring pose information.
- Outperforms existing approaches in both monocular and multi-view settings.
- Produces a universal model that can be animated after training on synthetic data.
- Replaces long optimization processes with a single forward pass.
Where Pith is reading between the lines
- Extending the UV token approach to other body parts could enable full-body avatars from casual captures.
- Improving correspondence estimation accuracy might further boost performance on challenging expressions.
- The reliance on synthetic data suggests potential for domain adaptation techniques to handle real-world lighting variations better.
Load-bearing premise
The method depends on having accurate pixel-wise facial correspondence estimation for each input image to enable color reprojection to UV space.
What would settle it
If the generated avatars show significant artifacts or fail to animate correctly when input images have varying expressions without precise correspondence maps, the central claim would be falsified.
Figures
read the original abstract
We present UIKA, a feed-forward animatable Gaussian head model from an arbitrary number of pose-free inputs, including a single image, multi-view captures, and smartphone-captured videos. Unlike the traditional avatar method, which requires a studio-level multi-view capture system and reconstructs a human-specific model through a long-time optimization process, we rethink the task through the lenses of model representation, network design, and data preparation. First, we introduce a UV-guided avatar modeling strategy, in which each input image is associated with a pixel-wise facial correspondence estimation. Such correspondence estimation allows us to reproject each valid pixel color from screen space to UV space, which is independent of camera pose and character expression. Furthermore, we design learnable UV tokens on which the attention mechanism can be applied at both the screen and UV levels. The learned UV tokens can be decoded into canonical Gaussian attributes using aggregated UV information from all input views. To train our large avatar model, we additionally prepare a large-scale, identity-rich synthetic training dataset. Our method significantly outperforms existing approaches in both monocular and multi-view settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces UIKA, a feed-forward animatable Gaussian head avatar model that accepts an arbitrary number of pose-free inputs (single image, multi-view captures, or smartphone videos). It proposes a UV-guided modeling strategy that associates each input with pixel-wise facial correspondence maps to reproject screen-space colors into a pose- and expression-independent UV space, aggregates information via learnable UV tokens and attention at both screen and UV levels, and decodes the tokens into canonical Gaussian attributes. The model is trained on a large-scale synthetic identity-rich dataset and claims significant outperformance over existing methods in monocular and multi-view settings.
Significance. If the performance claims and underlying assumptions are rigorously validated, the work could enable practical, optimization-free avatar creation from casual captures, advancing universal head modeling for AR/VR and animation applications. The combination of UV-space reprojection with attention-based aggregation and synthetic data training represents a promising direction for handling variable input counts without per-subject optimization.
major comments (2)
- [Abstract and §4] Abstract and §4 (Experiments): The central claim that the method 'significantly outperforms existing approaches in both monocular and multi-view settings' is asserted without quantitative metrics, specific baselines, error analysis, ablation studies, or statistical significance tests. This prevents verification of the outperformance and is load-bearing for the paper's primary contribution.
- [§3.1] §3.1 (UV-guided avatar modeling strategy): The reprojection of screen-space colors to UV space via per-image pixel-wise facial correspondence maps is presented as enabling consistent canonical Gaussians, yet no quantitative evaluation of correspondence accuracy, failure cases under expression/identity variation, or ablation on map quality is provided. Errors in these maps would directly corrupt aggregated UV tokens and the feed-forward reconstruction, making this assumption critical to the monocular and multi-view claims.
minor comments (2)
- [§3.2] Notation for 'learnable UV tokens' and their attention application at screen vs. UV levels could be formalized with equations to improve clarity of the aggregation process.
- [Abstract] The abstract mentions 'associated with a pixel-wise facial correspondence estimation' without specifying the source or method used to obtain these maps on arbitrary inputs.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We appreciate the emphasis on strengthening the quantitative validation of our claims and the robustness of the UV-guided modeling assumptions. Below we provide point-by-point responses to the major comments. We will incorporate the suggested additions in the revised version to improve clarity and verifiability.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experiments): The central claim that the method 'significantly outperforms existing approaches in both monocular and multi-view settings' is asserted without quantitative metrics, specific baselines, error analysis, ablation studies, or statistical significance tests. This prevents verification of the outperformance and is load-bearing for the paper's primary contribution.
Authors: We acknowledge that the abstract summarizes the key finding and that §4 would benefit from more explicit quantitative support to allow direct verification. The current experiments section includes comparisons against existing methods, but we agree that additional detail is warranted. In the revision we will expand §4 with dedicated tables reporting specific metrics (e.g., PSNR, SSIM, LPIPS), list the exact baselines used, include error analysis and ablation studies on core components, and add statistical significance tests where appropriate. These changes will make the outperformance claim fully substantiated and easier to evaluate. revision: yes
-
Referee: [§3.1] §3.1 (UV-guided avatar modeling strategy): The reprojection of screen-space colors to UV space via per-image pixel-wise facial correspondence maps is presented as enabling consistent canonical Gaussians, yet no quantitative evaluation of correspondence accuracy, failure cases under expression/identity variation, or ablation on map quality is provided. Errors in these maps would directly corrupt aggregated UV tokens and the feed-forward reconstruction, making this assumption critical to the monocular and multi-view claims.
Authors: We agree that a dedicated quantitative assessment of the correspondence maps is important given their central role in the pipeline. The present manuscript demonstrates the overall effectiveness through end-to-end results and qualitative examples, but does not isolate correspondence accuracy. In the revised version we will add an evaluation of correspondence quality (using available ground-truth landmarks on synthetic data), a discussion of observed failure cases under large expression and identity changes, and an ablation that measures the impact of map quality on final Gaussian reconstruction metrics. This will directly address the concern about error propagation. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper's core pipeline—UV-guided reprojection via assumed pixel-wise facial correspondence maps, learnable UV tokens with cross-level attention, aggregation into canonical Gaussian attributes, and training on externally prepared synthetic identity-rich data—does not reduce any claimed prediction or output to quantities defined by the inputs or by self-citation. The correspondence estimation is treated as an available input rather than derived internally, and performance claims rest on architectural and data choices without tautological fitting or renaming of prior results. This is the common case of an independent feed-forward model.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Pixel-wise facial correspondence estimation can be performed reliably on arbitrary input images to reproject colors to UV space independent of pose and expression.
Reference graph
Works this paper leans on
-
[1]
Cafca: High-quality novel view synthesis of expressive faces from casual few-shot captures
Marcel C Buehler, Gengyan Li, Erroll Wood, Leonhard Helminger, Xu Chen, Tanmay Shah, Daoye Wang, Stephan Garbin, Sergio Orts-Escolano, Otmar Hilliges, et al. Cafca: High-quality novel view synthesis of expressive faces from casual few-shot captures. InSIGGRAPH Asia 2024 Con- ference Papers, pages 1–12, 2024. 3
work page 2024
-
[2]
Adrian Bulat and Georgios Tzimiropoulos. How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks). InInternational Conference on Computer Vision, 2017. 6
work page 2017
-
[3]
Neural head reenactment with latent pose descriptors
Egor Burkov, Igor Pasechnik, Artur Grigorev, and Vic- tor Lempitsky. Neural head reenactment with latent pose descriptors. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13786– 13795, 2020. 1, 2
work page 2020
-
[4]
Hera: Hybrid explicit representation for ultra-realistic head avatars
Hongrui Cai, Yuting Xiao, Xuan Wang, Jiafei Li, Yudong Guo, Yanbo Fan, Shenghua Gao, and Juyong Zhang. Hera: Hybrid explicit representation for ultra-realistic head avatars. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 260–270, 2025. 1
work page 2025
-
[5]
Efficient geometry-aware 3d generative adversar- ial networks
Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, et al. Efficient geometry-aware 3d generative adversar- ial networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16123– 16133, 2022. 3
work page 2022
-
[6]
Mono- gaussianavatar: Monocular gaussian point-based head avatar
Yufan Chen, Lizhen Wang, Qijing Li, Hongjiang Xiao, Shengping Zhang, Hongxun Yao, and Yebin Liu. Mono- gaussianavatar: Monocular gaussian point-based head avatar. InACM SIGGRAPH 2024 Conference Papers, pages 1–9, 2024. 2, 3
work page 2024
-
[7]
Generalizable and an- imatable gaussian head avatar
Xuangeng Chu and Tatsuya Harada. Generalizable and an- imatable gaussian head avatar. InThe Thirty-eighth An- nual Conference on Neural Information Processing Sys- tems, 2024. 2, 3, 5, 7
work page 2024
-
[8]
Xuangeng Chu, Yu Li, Ailing Zeng, Tianyu Yang, Lijian Lin, Yunfei Liu, and Tatsuya Harada. Gpavatar: Generaliz- able and precise head avatar from image (s).arXiv preprint arXiv:2401.10215, 2024. 2, 3, 7
-
[9]
Arcface: Additive angular margin loss for deep face recognition
Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 4690–4699, 2019. 6
work page 2019
-
[10]
Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set
Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, and Xin Tong. Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition workshops, pages 0–0,
-
[11]
Portrait4d: Learning one-shot 4d head avatar synthesis using synthetic data
Yu Deng, Duomin Wang, Xiaohang Ren, Xingyu Chen, and Baoyuan Wang. Portrait4d: Learning one-shot 4d head avatar synthesis using synthetic data. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7119–7130, 2024. 2
work page 2024
-
[12]
Yu Deng, Duomin Wang, and Baoyuan Wang. Portrait4d- v2: Pseudo multi-view data creates better 4d head synthe- sizer.arXiv preprint arXiv:2403.13570, 2024. 2, 3, 7
-
[13]
Diffusionrig: Learning personalized priors for facial appearance editing
Zheng Ding, Xuaner Zhang, Zhihao Xia, Lars Jebe, Zhuowen Tu, and Xiuming Zhang. Diffusionrig: Learning personalized priors for facial appearance editing. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12736–12746, 2023. 2, 3, 7
work page 2023
-
[14]
Scaling rec- tified flow transformers for high-resolution image synthe- sis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, and Robin Rombach. Scaling rec- tified flow transformers for high-resolution image synthe- sis. InProceedings of the 41st International Conference on Machine Learning. ...
work page 2024
-
[15]
Yao Feng, Haiwen Feng, Michael J Black, and Timo Bolkart. Learning an animatable detailed 3d face model from in-the-wild images.ACM Transactions on Graphics (ToG), 40(4):1–13, 2021. 3
work page 2021
-
[16]
Dynamic neural radiance fields for monocu- lar 4d facial avatar reconstruction
Guy Gafni, Justus Thies, Michael Zollhofer, and Matthias Nießner. Dynamic neural radiance fields for monocu- lar 4d facial avatar reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8649–8658, 2021. 3
work page 2021
-
[17]
Rinon Gal, Or Patashnik, Haggai Maron, Amit H Bermano, Gal Chechik, and Daniel Cohen-Or. Stylegan-nada: Clip- guided domain adaptation of image generators.ACM Transactions on Graphics (TOG), 41(4):1–13, 2022. 2
work page 2022
-
[18]
Constructing diffusion avatar with learnable embeddings
Xuan Gao, Jingtao Zhou, Dongyu Liu, Yuqi Zhou, and Juy- ong Zhang. Constructing diffusion avatar with learnable embeddings. InACM SIGGRAPH Asia Conference Pro- ceedings, 2025. 1
work page 2025
-
[19]
Pixel3dmm: Versatile screen-space priors for single-image 3d face reconstruction,
Simon Giebenhain, Tobias Kirschstein, Martin Rünz, Lour- des Agapito, and Matthias Nießner. Pixel3dmm: Versatile screen-space priors for single-image 3d face reconstruction,
-
[20]
Toontalker: Cross-domain face reenactment
Yuan Gong, Yong Zhang, Xiaodong Cun, Fei Yin, Yanbo Fan, Xuan Wang, Baoyuan Wu, and Yujiu Yang. Toontalker: Cross-domain face reenactment. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 7690–7700, 2023. 1
work page 2023
-
[21]
Generative adversarial networks.Com- munications of the ACM, 63(11):139–144, 2020
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.Com- munications of the ACM, 63(11):139–144, 2020. 1, 2
work page 2020
-
[22]
Neural head avatars from monocular rgb videos
Philip-William Grassal, Malte Prinzler, Titus Leistner, Carsten Rother, Matthias Nießner, and Justus Thies. Neural head avatars from monocular rgb videos. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18653–18664, 2022. 3
work page 2022
-
[23]
Liveportrait: Efficient portrait animation with stitching and retargeting control
Jianzhu Guo, Dingyun Zhang, Xiaoqiang Liu, Zhizhou Zhong, Yuan Zhang, Pengfei Wan, and Di Zhang. Live- portrait: Efficient portrait animation with stitching and re- targeting control.arXiv preprint arXiv:2407.03168, 2024. 5
-
[24]
Lam: Large avatar model for one-shot animatable gaus- sian head
Yisheng He, Xiaodong Gu, Xiaodan Ye, Chao Xu, Zhengyi Zhao, Yuan Dong, Weihao Yuan, Zilong Dong, and Liefeng 9 Bo. Lam: Large avatar model for one-shot animatable gaus- sian head. InProceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers, pages 1–13, 2025. 2, 3, 7
work page 2025
-
[25]
Depth-aware generative adversarial network for talking head video generation
Fa-Ting Hong, Longhao Zhang, Li Shen, and Dan Xu. Depth-aware generative adversarial network for talking head video generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3397–3406, 2022. 2
work page 2022
-
[26]
Headnerf: A real-time nerf-based parametric head model
Yang Hong, Bo Peng, Haiyao Xiao, Ligang Liu, and Juyong Zhang. Headnerf: A real-time nerf-based parametric head model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20374– 20384, 2022. 3
work page 2022
-
[27]
LRM: Large Reconstruction Model for Single Image to 3D
Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. Lrm: Large reconstruction model for single image to 3d.arXiv preprint arXiv:2311.04400, 2023. 2
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[28]
A style-based generator architecture for generative adversarial networks
Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019. 1, 2
work page 2019
-
[29]
Alias-free generative adversarial networks
Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Alias-free generative adversarial networks. InProc. NeurIPS, 2021. 1, 2
work page 2021
-
[30]
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4), 2023. 3
work page 2023
-
[31]
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017. 5
work page 2017
-
[32]
Nersemble: Multi-view ra- diance field reconstruction of human heads.ACM Trans
Tobias Kirschstein, Shenhan Qian, Simon Giebenhain, Tim Walter, and Matthias Nießner. Nersemble: Multi-view ra- diance field reconstruction of human heads.ACM Trans. Graph., 42(4), 2023. 3, 5, 15
work page 2023
-
[33]
FlexAvatar: Learning Complete 3D Head Avatars with Partial Supervision
Tobias Kirschstein, Simon Giebenhain, and Matthias Nießner. Flexavatar: Learning complete 3d head avatars with partial supervision.arXiv preprint arXiv:2512.15599,
work page internal anchor Pith review Pith/arXiv arXiv
-
[34]
Tobias Kirschstein, Javier Romero, Artem Sevastopolsky, Matthias Nießner, and Shunsuke Saito. Avat3r: Large ani- matable gaussian reconstruction model for high-fidelity 3d head avatars.arXiv preprint arXiv:2502.20220, 2025. 2, 3, 7, 15
-
[35]
Surfhead: Affine rig blending for geometri- cally accurate 2d gaussian surfel head avatars
Jaeseong Lee, Taewoong Kang, Marcel Buehler, Min-Jung Kim, Sungwon Hwang, Junha Hyung, Hyojin Jang, and Jaegul Choo. Surfhead: Affine rig blending for geometri- cally accurate 2d gaussian surfel head avatars. InThe Thir- teenth International Conference on Learning Representa- tions, 2025. 1
work page 2025
-
[36]
Spherehead: Stable 3d full-head synthesis with spherical tri-plane representa- tion, 2024
Heyuan Li, Ce Chen, Tianhao Shi, Yuda Qiu, Sizhe An, Guanying Chen, and Xiaoguang Han. Spherehead: Stable 3d full-head synthesis with spherical tri-plane representa- tion, 2024. 5
work page 2024
-
[37]
Tianye Li, Timo Bolkart, Michael. J. Black, Hao Li, and Javier Romero. Learning a model of facial shape and ex- pression from 4D scans.ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):194:1–194:17, 2017. 4, 5, 16
work page 2017
-
[38]
One-shot high- fidelity talking-head synthesis with deformable neural ra- diance field
Weichuang Li, Longhao Zhang, Dong Wang, Bin Zhao, Zhigang Wang, Mulin Chen, Bang Zhang, Zhongjian Wang, Liefeng Bo, and Xuelong Li. One-shot high- fidelity talking-head synthesis with deformable neural ra- diance field. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17969– 17978, 2023. 3
work page 2023
-
[39]
Xueting Li, Shalini De Mello, Sifei Liu, Koki Nagano, Umar Iqbal, and Jan Kautz. Generalizable one-shot 3d neu- ral head avatar.Advances in Neural Information Processing Systems, 36, 2024. 3
work page 2024
-
[40]
Hao Liang, Zhixuan Ge, Ashish Tiwari, Soumendu Ma- jee, GM Godaliyadda, Ashok Veeraraghavan, and Guha Balakrishnan. Fastavatar: Instant 3d gaussian splatting for faces from single unconstrained poses.arXiv preprint arXiv:2508.18389, 2025. 2, 3
-
[41]
Zhanfeng Liao, Yuelang Xu, Zhe Li, Qijing Li, Boyao Zhou, Ruifeng Bai, Di Xu, Hongwen Zhang, and Yebin Liu. Hhavatar: Gaussian head avatar with dynamic hairs.IEEE Transactions on Pattern Analysis and Machine Intelligence,
-
[42]
Hongyu Liu, Xintong Han, Chengbin Jin, Lihui Qian, Huawei Wei, Zhe Lin, Faqiang Wang, Haoye Dong, Yib- ing Song, Jia Xu, et al. Human motionformer: Transferring human motions with vision transformers.arXiv preprint arXiv:2302.11306, 2023. 1
-
[43]
Avatarartist: Open-domain 4d avatarization
Hongyu Liu, Xuan Wang, Ziyu Wan, Yue Ma, Jingye Chen, Yanbo Fan, Yujun Shen, Yibing Song, and Qifeng Chen. Avatarartist: Open-domain 4d avatarization. InProceed- ings of the Computer Vision and Pattern Recognition Con- ference, pages 10758–10769, 2025. 2, 3
work page 2025
-
[44]
Follow your pose: Pose- guided text-to-video generation using pose-free videos
Yue Ma, Yingqing He, Xiaodong Cun, Xintao Wang, Siran Chen, Xiu Li, and Qifeng Chen. Follow your pose: Pose- guided text-to-video generation using pose-free videos. In Proceedings of the AAAI Conference on Artificial Intelli- gence, pages 4117–4125, 2024. 1
work page 2024
-
[45]
Follow-your-emoji: Fine-controllable and expressive freestyle portrait animation
Yue Ma, Hongyu Liu, Hongfa Wang, Heng Pan, Yingqing He, Junkun Yuan, Ailing Zeng, Chengfei Cai, Heung- Yeung Shum, Wei Liu, et al. Follow-your-emoji: Fine- controllable and expressive freestyle portrait animation. arXiv preprint arXiv:2406.01900, 2024. 1, 3
-
[46]
Yue Ma, Zexuan Yan, Hongyu Liu, Hongfa Wang, Heng Pan, Yingqing He, Junkun Yuan, Ailing Zeng, Chengfei Cai, Heung-Yeung Shum, Zhifeng Li, Wei Liu, Zhang lin- feng, and Qifeng Chen. Follow-your-emoji-faster: To- wards efficient, fine-controllable, and expressive freestyle portrait animation.International Journal of Computer Vi- sion (IJCV), 2025. 1, 3
work page 2025
-
[47]
Otavatar: One-shot talking face avatar with control- lable tri-plane rendering
Zhiyuan Ma, Xiangyu Zhu, Guo-Jun Qi, Zhen Lei, and Lei Zhang. Otavatar: One-shot talking face avatar with control- lable tri-plane rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16901–16910, 2023. 3 10
work page 2023
-
[48]
Jewett, Si- mon Venshtain, Christopher Heilman, Yueh-Tung Chen, Sidi Fu, Mohamed Ezzeldin A
Julieta Martinez, Emily Kim, Javier Romero, Timur Bagautdinov, Shunsuke Saito, Shoou-I Yu, Stuart Ander- son, Michael Zollhöfer, Te-Li Wang, Shaojie Bai, Chenghui Li, Shih-En Wei, Rohan Joshi, Wyatt Borsos, Tomas Si- mon, Jason Saragih, Paul Theodosis, Alexander Greene, Anjani Josyula, Silvio Mano Maeta, Andrew I. Jewett, Si- mon Venshtain, Christopher He...
work page 2024
-
[49]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InECCV, 2020. 3
work page 2020
-
[50]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 1, 3
work page 2021
-
[51]
Dit- head: High-resolution talking head synthesis using diffu- sion transformers, 2023
Aaron Mir, Eduardo Alonso, and Esther Mondragón. Dit- head: High-resolution talking head synthesis using diffu- sion transformers, 2023. 3
work page 2023
-
[52]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervi- sion.arXiv preprint arXiv:2304.07193, 2023. 14
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[53]
Perchead: Perceptual head model for single-image 3d head reconstruction & editing, 2025
Antonio Oroz, Matthias Nießner, and Tobias Kirschstein. Perchead: Perceptual head model for single-image 3d head reconstruction & editing, 2025. 2, 3
work page 2025
-
[54]
Renderme-360: a large digital asset li- brary and benchmarks towards high-fidelity head avatars
Dongwei Pan, Long Zhuo, Jingtan Piao, Huiwen Luo, Wei Cheng, Yuxin Wang, Siming Fan, Shengqi Liu, Lei Yang, Bo Dai, et al. Renderme-360: a large digital asset li- brary and benchmarks towards high-fidelity head avatars. Advances in Neural Information Processing Systems, 36,
-
[55]
Pytorch: An imperative style, high-performance deep learning library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Rai- son, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-per...
work page 2019
-
[56]
Scalable Diffusion Models with Transformers
William Peebles and Saining Xie. Scalable diffusion mod- els with transformers.arXiv preprint arXiv:2212.09748,
work page internal anchor Pith review Pith/arXiv arXiv
-
[57]
Cheng Peng, Zhuo Su, Liao Wang, Chen Guo, Zhaohu Li, Chengjiang Long, Zheng Lv, Jingxiang Sun, Chenyang- guang Zhang, and Yebin Liu. Flexavatar: Flexible large reconstruction model for animatable gaussian head avatars with detailed deformation, 2025. 3
work page 2025
-
[58]
Vhap: Versatile head alignment with adap- tive appearance priors, 2024
Shenhan Qian. Vhap: Versatile head alignment with adap- tive appearance priors, 2024. 5
work page 2024
-
[59]
Gaussianavatars: Photorealistic head avatars with rigged 3d gaussians.IEEE Conf
Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Da- vide Davoli, Simon Giebenhain, and Matthias Nießner. Gaussianavatars: Photorealistic head avatars with rigged 3d gaussians.IEEE Conf. Comput. Vis. Pattern Recog., 2024. 5
work page 2024
-
[60]
Gaussianavatars: Photorealistic head avatars with rigged 3d gaussians
Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Da- vide Davoli, Simon Giebenhain, and Matthias Nießner. Gaussianavatars: Photorealistic head avatars with rigged 3d gaussians. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20299– 20309, 2024. 1
work page 2024
-
[61]
Lhm: Large ani- matable human reconstruction model for single image to 3d in seconds
Lingteng Qiu, Xiaodong Gu, Peihao Li, Qi Zuo, Weichao Shen, Junfei Zhang, Kejie Qiu, Weihao Yuan, Guanying Chen, Zilong Dong, and Liefeng Bo. Lhm: Large ani- matable human reconstruction model for single image to 3d in seconds. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision (ICCV), pages 14184–14194, 2025. 2
work page 2025
-
[62]
Pf-lhm: 3d animatable avatar reconstruction from pose-free articulated human images
Lingteng Qiu, Peihao Li, Qi Zuo, Xiaodong Gu, Yuan Dong, Weihao Yuan, Siyu Zhu, Xiaoguang Han, Guany- ing Chen, and Zilong Dong. Pf-lhm: 3d animatable avatar reconstruction from pose-free articulated human images. arXiv preprint arXiv:2506.13766, 2025. 2
-
[63]
René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. Towards robust monocu- lar depth estimation: Mixing datasets for zero-shot cross- dataset transfer.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020. 3, 5, 14
work page 2020
-
[64]
Vi- sion transformers for dense prediction.ArXiv preprint,
René Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vi- sion transformers for dense prediction.ArXiv preprint,
-
[65]
Accelerating 3D Deep Learning with PyTorch3D
Nikhila Ravi, Jeremy Reizenstein, David Novotny, Tay- lor Gordon, Wan-Yen Lo, Justin Johnson, and Georgia Gkioxari. Accelerating 3d deep learning with pytorch3d. arXiv:2007.08501, 2020. 5
work page internal anchor Pith review Pith/arXiv arXiv 2007
-
[66]
Animating arbitrary objects via deep motion transfer
Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. Animating arbitrary objects via deep motion transfer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2377–2386, 2019. 2
work page 2019
-
[67]
Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. First order mo- tion model for image animation.Advances in neural information processing systems, 32, 2019. 1
work page 2019
-
[68]
Motion representations for ar- ticulated animation
Aliaksandr Siarohin, Oliver J Woodford, Jian Ren, Menglei Chai, and Sergey Tulyakov. Motion representations for ar- ticulated animation. InProceedings of the IEEE/CVF Con- 11 ference on Computer Vision and Pattern Recognition, pages 13653–13662, 2021. 2
work page 2021
-
[69]
Oriane Siméoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sen- tana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Jul...
work page 2025
-
[70]
Diffused heads: Diffusion models beat gans on talking-face gen- eration
Michał Stypułkowski, Konstantinos V ougioukas, Sen He, Maciej Zi˛ eba, Stavros Petridis, and Maja Pantic. Diffused heads: Diffusion models beat gans on talking-face gen- eration. InProceedings of the IEEE/CVF Winter Con- ference on Applications of Computer Vision, pages 5091– 5100, 2024. 3
work page 2024
-
[71]
Next3d: Genera- tive neural texture rasterization for 3d-aware head avatars
Jingxiang Sun, Xuan Wang, Lizhen Wang, Xiaoyu Li, Yong Zhang, Hongwen Zhang, and Yebin Liu. Next3d: Genera- tive neural texture rasterization for 3d-aware head avatars. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20991–21002, 2023. 2
work page 2023
-
[72]
Felix Taubner, Ruihang Zhang, Mathieu Tuli, Sherwin Bah- mani, and David B. Lindell. Mvp4d: Multi-view portrait video diffusion for animatable 4d avatars. New York, NY , USA, 2025. Association for Computing Machinery. 3
work page 2025
-
[73]
Felix Taubner, Ruihang Zhang, Mathieu Tuli, and David B. Lindell. CAP4D: Creating animatable 4D portrait avatars with morphable multi-view diffusion models. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5318–5330, 2025. 2, 3
work page 2025
-
[74]
Real-time radiance fields for single-image portrait view synthesis
Alex Trevithick, Matthew Chan, Michael Stengel, Eric Chan, Chao Liu, Zhiding Yu, Sameh Khamis, Ravi Ra- mamoorthi, and Koki Nagano. Real-time radiance fields for single-image portrait view synthesis. 2023. 3
work page 2023
-
[75]
Duomin Wang, Yu Deng, Zixin Yin, Heung-Yeung Shum, and Baoyuan Wang. Progressive disentangled representa- tion learning for fine-grained controllable talking head syn- thesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17979– 17989, 2023. 1, 2
work page 2023
-
[76]
Vggt: Visual geometry grounded transformer
Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025. 3, 14, 15
work page 2025
-
[77]
To- wards real-world blind face restoration with generative fa- cial prior
Xintao Wang, Yu Li, Honglun Zhang, and Ying Shan. To- wards real-world blind face restoration with generative fa- cial prior. InIEEE Conf. Comput. Vis. Pattern Recog., 2021. 2
work page 2021
-
[78]
3d gaussian head avatars with expressive dynamic appearances by compact tenso- rial representations
Yating Wang, Xuan Wang, Ran Yi, Yanbo Fan, Jichen Hu, Jingcheng Zhu, and Lizhuang Ma. 3d gaussian head avatars with expressive dynamic appearances by compact tenso- rial representations. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 21117–21126,
-
[79]
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation,
Huawei Wei, Zejun Yang, and Zhisheng Wang. Anipor- trait: Audio-driven synthesis of photorealistic portrait ani- mations.arXiv:2403.17694, 2024. 3
-
[80]
Yue Wu, Yufan Wu, Wen Li, Yuxi Lu, Kairui Feng, and Xu- anhong Chen. Fastavatar: Towards unified fast high-fidelity 3d avatar reconstruction with large gaussian reconstruction transformers, 2025. 2, 3
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.