Recognition: no theorem link
AvatarPointillist: AutoRegressive 4D Gaussian Avatarization
Pith reviewed 2026-05-10 18:44 UTC · model grok-4.3
The pith
A decoder-only Transformer autoregressively builds point clouds for 4D Gaussian avatars from one portrait image.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that autoregressively generating the points of a 3D Gaussian Splatting representation with a decoder-only Transformer, while jointly predicting per-point binding information, enables adaptive point density and produces high-fidelity, controllable 4D avatars from a single image once the Gaussian decoder is conditioned on the autoregressive latent features.
What carries the argument
Decoder-only Transformer that autoregressively outputs points and their animation bindings for a 3D Gaussian Splatting point cloud, followed by a latent-conditioned Gaussian decoder.
If this is right
- Point count and density adjust automatically to match the complexity of each subject.
- Animation becomes possible directly from the predicted per-point binding data.
- Conditioning the Gaussian decoder on autoregressive latent features improves final render quality.
- The full pipeline works from a single portrait without requiring multi-view input or post-processing.
Where Pith is reading between the lines
- The sequential generation style could support incremental updates, such as adding detail to an avatar over time.
- Training data requirements might decrease because the model learns local point decisions rather than global arrangements at once.
- The same autoregressive structure could be tested on related tasks like generating dynamic scenes or objects beyond human avatars.
Load-bearing premise
That sequential autoregressive point generation plus joint binding prediction will produce higher-fidelity and more controllable avatars than generating all points simultaneously.
What would settle it
A head-to-head test on the same single-image input set where a non-autoregressive Gaussian avatar method matches or exceeds the autoregressive version on photorealism and animation quality metrics.
Figures
read the original abstract
We introduce AvatarPointillist, a novel framework for generating dynamic 4D Gaussian avatars from a single portrait image. At the core of our method is a decoder-only Transformer that autoregressively generates a point cloud for 3D Gaussian Splatting. This sequential approach allows for precise, adaptive construction, dynamically adjusting point density and the total number of points based on the subject's complexity. During point generation, the AR model also jointly predicts per-point binding information, enabling realistic animation. After generation, a dedicated Gaussian decoder converts the points into complete, renderable Gaussian attributes. We demonstrate that conditioning the decoder on the latent features from the AR generator enables effective interaction between stages and markedly improves fidelity. Extensive experiments validate that AvatarPointillist produces high-quality, photorealistic, and controllable avatars. We believe this autoregressive formulation represents a new paradigm for avatar generation, and we will release our code inspire future research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces AvatarPointillist, a framework for generating dynamic 4D Gaussian avatars from a single portrait image. A decoder-only Transformer autoregressively produces a point cloud for 3D Gaussian Splatting while jointly predicting per-point binding information to enable animation. A latent-conditioned Gaussian decoder then converts these points into full renderable Gaussian attributes, with the authors claiming that the autoregressive sequential construction allows adaptive point density and number based on subject complexity, yielding photorealistic and controllable results.
Significance. If the empirical claims hold, the autoregressive formulation for adaptive point generation combined with joint binding prediction and latent conditioning offers a potentially more flexible pipeline for controllable 4D avatar creation than non-autoregressive alternatives. The explicit plan to release code is a strength that supports reproducibility and future work in the area.
minor comments (2)
- [Experiments] The abstract states that 'extensive experiments validate' improved fidelity and controllability, but the main text should include explicit quantitative tables with baselines, ablations on the autoregressive component versus non-autoregressive variants, and error bars or statistical significance to make the central empirical claim fully verifiable.
- [Method] In the method description, the precise mechanism by which latent features from the AR generator are injected into the Gaussian decoder (e.g., cross-attention, concatenation, or FiLM) should be specified with an equation or diagram to ensure the claimed 'effective interaction between stages' is reproducible.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of AvatarPointillist, the recognition of the potential advantages of the autoregressive adaptive point generation combined with joint binding prediction, and the recommendation for minor revision. The explicit note on code release as a reproducibility strength is appreciated. No major comments were raised in the report.
Circularity Check
No significant circularity detected
full rationale
The paper introduces AvatarPointillist as a new autoregressive pipeline: a decoder-only Transformer generates sequential points and joint binding predictions for 3D Gaussian Splatting, followed by a latent-conditioned Gaussian decoder. No equations, fitted parameters, or derivations are shown that reduce by construction to prior outputs or self-citations. The abstract and method description present the autoregressive formulation and stage interaction as an independent design choice without invoking uniqueness theorems, ansatzes from prior self-work, or renaming of known results. The central claim of improved fidelity and controllability therefore remains self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Lan- guage models are few-shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Lan- guage models are few-shot learners. InAdvances in neural information processing systems, pages 1877–1901, 2020. 3
1901
-
[2]
Cafca: High-quality novel view synthesis of expressive faces from casual few-shot captures
Marcel C Buehler, Gengyan Li, Erroll Wood, Leonhard Helminger, Xu Chen, Tanmay Shah, Daoye Wang, Stephan Garbin, Sergio Orts-Escolano, Otmar Hilliges, et al. Cafca: High-quality novel view synthesis of expressive faces from casual few-shot captures. InSIGGRAPH Asia 2024 Confer- ence Papers, pages 1–12, 2024. 3
2024
-
[3]
Neural head reenactment with latent pose descriptors
Egor Burkov, Igor Pasechnik, Artur Grigorev, and Victor Lempitsky. Neural head reenactment with latent pose descriptors. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13786– 13795, 2020. 3
2020
-
[4]
Chan, Connor Z
Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J. Guibas, Jonathan Tremblay, Sameh Khamis, Tero Karras, and Gordon Wetzstein. Efficient geometry- aware 3d generative adversarial networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16123–16133, 2022. 3
2022
-
[5]
Monogaus- sianavatar: Monocular gaussian point-based head avatar
Yufan Chen, Lizhen Wang, Qijing Li, Hongjiang Xiao, Shengping Zhang, Hongxun Yao, and Yebin Liu. Monogaus- sianavatar: Monocular gaussian point-based head avatar. In ACM SIGGRAPH 2024 Conference Papers, pages 1–9, 2024. 3
2024
-
[6]
Generalizable and animatable gaussian head avatar
Xuangeng Chu and Tatsuya Harada. Generalizable and animatable gaussian head avatar. InThe Thirty-eighth An- nual Conference on Neural Information Processing Systems,
-
[7]
arXiv preprint arXiv:2401.10215 (2024)
Xuangeng Chu, Yu Li, Ailing Zeng, Tianyu Yang, Lijian Lin, Yunfei Liu, and Tatsuya Harada. Gpavatar: Generalizable and precise head avatar from image (s).arXiv preprint arXiv:2401.10215, 2024. 3
-
[8]
Emoca: Emotion driven monocular face capture and animation
Radek Dan ˇeˇcek, Michael J Black, and Timo Bolkart. Emoca: Emotion driven monocular face capture and animation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20311–20322, 2022. 3
2022
-
[9]
Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set
Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, and Xin Tong. Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 0–0, 2019. 3
2019
-
[10]
Portrait4d: Learning one-shot 4d head avatar synthesis using synthetic data
Yu Deng, Duomin Wang, Xiaohang Ren, Xingyu Chen, and Baoyuan Wang. Portrait4d: Learning one-shot 4d head avatar synthesis using synthetic data. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7119–7130, 2024. 2, 3
2024
-
[11]
Portrait4d-v2: Pseudo multi-view data creates better 4d head synthesizer
Yu Deng, Duomin Wang, and Baoyuan Wang. Portrait4d-v2: Pseudo multi-view data creates better 4d head synthesizer. arXiv preprint arXiv:2403.13570, 2024. 3, 6, 7
-
[12]
Megaportraits: One-shot megapixel neural head avatars
Nikita Drobyshev, Jenya Chelishev, Taras Khakhulin, Alek- sei Ivakhnenko, Victor Lempitsky, and Egor Zakharov. Megaportraits: One-shot megapixel neural head avatars. In Proceedings of the 30th ACM International Conference on Multimedia, pages 2663–2671, 2022. 3
2022
-
[13]
Learning an animatable detailed 3d face model from in-the- wild images.ACM Transactions on Graphics (ToG), 40(4): 1–13, 2021
Yao Feng, Haiwen Feng, Michael J Black, and Timo Bolkart. Learning an animatable detailed 3d face model from in-the- wild images.ACM Transactions on Graphics (ToG), 40(4): 1–13, 2021. 3
2021
-
[14]
Dynamic neural radiance fields for monocular 4d facial avatar reconstruction
Guy Gafni, Justus Thies, Michael Zollhofer, and Matthias Nießner. Dynamic neural radiance fields for monocular 4d facial avatar reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8649–8658, 2021. 3
2021
-
[15]
Pixel3dmm: Versatile screen-space priors for single-image 3d face reconstruction,
Simon Giebenhain, Tobias Kirschstein, Martin R ¨unz, Lour- des Agapito, and Matthias Nießner. Pixel3dmm: Versatile screen-space priors for single-image 3d face reconstruction,
-
[16]
Toontalker: Cross-domain face reenactment
Yuan Gong, Yong Zhang, Xiaodong Cun, Fei Yin, Yanbo Fan, Xuan Wang, Baoyuan Wu, and Yujiu Yang. Toontalker: Cross-domain face reenactment. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7690–7700, 2023. 3
2023
-
[17]
Generative adversarial networks.Commu- nications of the ACM, 63(11):139–144, 2020
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.Commu- nications of the ACM, 63(11):139–144, 2020. 2
2020
-
[18]
Neural head avatars from monocular rgb videos
Philip-William Grassal, Malte Prinzler, Titus Leistner, Carsten Rother, Matthias Nießner, and Justus Thies. Neural head avatars from monocular rgb videos. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18653–18664, 2022. 3
2022
-
[19]
arXiv preprint arXiv:2407.03168 , year =
Jianzhu Guo, Dingyun Zhang, Xiaoqiang Liu, Zhizhou Zhong, Yuan Zhang, Pengfei Wan, and Di Zhang. Live- portrait: Efficient portrait animation with stitching and retargeting control.arXiv preprint arXiv:2407.03168, 2024. 2
-
[20]
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, and Bo Dai. Animatediff: Animate your personalized text- to-image diffusion models without specific tuning.arXiv preprint arXiv:2307.04725, 2023. 2
work page internal anchor Pith review arXiv 2023
-
[21]
arXiv preprint arXiv:2412.09548 , year=
Zekun Hao, David W Romero, Tsung-Yi Lin, and Ming-Yu Liu. Meshtron: High-fidelity, artist-like 3d mesh generation at scale.arXiv preprint arXiv:2412.09548, 2024. 4, 5, 6
-
[22]
Lam: Large avatar model for one-shot animatable gaussian head
Yisheng He, Xiaodong Gu, Xiaodan Ye, Chao Xu, Zhengyi Zhao, Yuan Dong, Weihao Yuan, Zilong Dong, and Liefeng Bo. Lam: Large avatar model for one-shot animatable gaussian head. InProceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers, pages 1–13, 2025. 2, 3, 5, 6, 7, 8
2025
-
[23]
Abdelrahman, and Ayoub Al- Hamadi
Thorsten Hempel, Ahmed A. Abdelrahman, and Ayoub Al- Hamadi. Toward robust and unconstrained full range of rotation head pose estimation.IEEE Transactions on Image Processing, 33:2377–2387, 2024. 6
2024
-
[24]
Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017. 6
2017
-
[25]
Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 2
2020
-
[26]
Depth-aware generative adversarial network for talking head video generation
Fa-Ting Hong, Longhao Zhang, Li Shen, and Dan Xu. Depth-aware generative adversarial network for talking head video generation. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 3397–3406, 2022. 3
2022
-
[27]
Headnerf: A real-time nerf-based parametric head model
Yang Hong, Bo Peng, Haiyao Xiao, Ligang Liu, and Juyong Zhang. Headnerf: A real-time nerf-based parametric head model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20374– 20384, 2022. 3
2022
-
[28]
A style-based generator architecture for generative adversarial networks
Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019. 2
2019
-
[29]
Alias-free generative adversarial networks
Tero Karras, Miika Aittala, Samuli Laine, Erik H ¨ark¨onen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Alias-free generative adversarial networks. InProc. NeurIPS, 2021. 2
2021
-
[30]
3d gaussian splatting for real-time radiance field rendering.ACM Trans
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1,
-
[31]
3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023. 3
2023
-
[32]
Nersemble: Multi-view radiance field reconstruction of human heads.ACM Trans
Tobias Kirschstein, Shenhan Qian, Simon Giebenhain, Tim Walter, and Matthias Nießner. Nersemble: Multi-view radiance field reconstruction of human heads.ACM Trans. Graph., 42(4), 2023. 2, 3, 4, 6, 7, 8
2023
-
[33]
Tobias Kirschstein, Javier Romero, Artem Sevastopolsky, Matthias Nießner, and Shunsuke Saito. Avat3r: Large animatable gaussian reconstruction model for high-fidelity 3d head avatars.arXiv preprint arXiv:2502.20220, 2025. 3
-
[34]
ARMesh: Autoregressive mesh generation via next-level-of-detail pre- diction
Jiabao Lei, Kewei Shi, Zhihao Liang, and Kui Jia. ARMesh: Autoregressive mesh generation via next-level-of-detail pre- diction. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 4
2025
-
[35]
Tianye Li, Timo Bolkart, Michael. J. Black, Hao Li, and Javier Romero. Learning a model of facial shape and expression from 4D scans.ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):194:1–194:17, 2017. 2, 3, 4
2017
-
[36]
One-shot high-fidelity talking- head synthesis with deformable neural radiance field
Weichuang Li, Longhao Zhang, Dong Wang, Bin Zhao, Zhigang Wang, Mulin Chen, Bang Zhang, Zhongjian Wang, Liefeng Bo, and Xuelong Li. One-shot high-fidelity talking- head synthesis with deformable neural radiance field. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17969–17978, 2023. 2, 3
2023
-
[37]
Generalizable one-shot 3d neural head avatar.Advances in Neural Information Processing Systems, 36, 2024
Xueting Li, Shalini De Mello, Sifei Liu, Koki Nagano, Umar Iqbal, and Jan Kautz. Generalizable one-shot 3d neural head avatar.Advances in Neural Information Processing Systems, 36, 2024. 3
2024
-
[38]
Hongyu Liu, Xintong Han, Chengbin Jin, Lihui Qian, Huawei Wei, Zhe Lin, Faqiang Wang, Haoye Dong, Yibing Song, Jia Xu, et al. Human motionformer: Transferring human motions with vision transformers.arXiv preprint arXiv:2302.11306, 2023. 3
-
[39]
Headartist: Text- conditioned 3d head generation with self score distillation
Hongyu Liu, Xuan Wang, Ziyu Wan, Yujun Shen, Yibing Song, Jing Liao, and Qifeng Chen. Headartist: Text- conditioned 3d head generation with self score distillation. InACM SIGGRAPH 2024 Conference Papers, New York, NY , USA, 2024. Association for Computing Machinery. 3
2024
-
[40]
Avatarartist: Open-domain 4d avatarization
Hongyu Liu, Xuan Wang, Ziyu Wan, Yue Ma, Jingye Chen, Yanbo Fan, Yujun Shen, Yibing Song, and Qifeng Chen. Avatarartist: Open-domain 4d avatarization. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 10758–10769, 2025. 2, 3, 6, 7
2025
-
[41]
Headartist-vl: Vi- sion/language guided 3d head generation with self score distillation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
Hongyu Liu, Xuan Wang, Ziyu Wan, Yujun Shen, Yibing Song, Jing Liao, and Qifeng Chen. Headartist-vl: Vi- sion/language guided 3d head generation with self score distillation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 3
2025
-
[42]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2019. 6
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[43]
MediaPipe: A Framework for Building Perception Pipelines
Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris Mc- Clanahan, Esha Uboweja, Michael Hays, Fan Zhang, Chuo- Ling Chang, Ming Guang Yong, Juhyun Lee, et al. Medi- apipe: A framework for building perception pipelines.arXiv preprint arXiv:1906.08172, 2019. 6
work page internal anchor Pith review arXiv 1906
-
[44]
Follow your pose: Pose- guided text-to-video generation using pose-free videos
Yue Ma, Yingqing He, Xiaodong Cun, Xintao Wang, Siran Chen, Xiu Li, and Qifeng Chen. Follow your pose: Pose- guided text-to-video generation using pose-free videos. In Proceedings of the AAAI Conference on Artificial Intelli- gence, pages 4117–4125, 2024. 3
2024
-
[45]
Yue Ma, Hongyu Liu, Hongfa Wang, Heng Pan, Yingqing He, Junkun Yuan, Ailing Zeng, Chengfei Cai, Heung-Yeung Shum, Wei Liu, et al. Follow-your-emoji: Fine-controllable and expressive freestyle portrait animation.arXiv preprint arXiv:2406.01900, 2024. 2, 3
-
[46]
Otavatar: One-shot talking face avatar with control- lable tri-plane rendering
Zhiyuan Ma, Xiangyu Zhu, Guo-Jun Qi, Zhen Lei, and Lei Zhang. Otavatar: One-shot talking face avatar with control- lable tri-plane rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16901–16910, 2023. 2, 3
2023
-
[47]
Jewett, Simon Venshtain, Christopher Heilman, Yueh-Tung Chen, Sidi Fu, Mohamed Ezzeldin A
Julieta Martinez, Emily Kim, Javier Romero, Timur Bagaut- dinov, Shunsuke Saito, Shoou-I Yu, Stuart Anderson, Michael Zollh ¨ofer, Te-Li Wang, Shaojie Bai, Chenghui Li, Shih-En Wei, Rohan Joshi, Wyatt Borsos, Tomas Simon, Jason Saragih, Paul Theodosis, Alexander Greene, Anjani Josyula, Silvio Mano Maeta, Andrew I. Jewett, Simon Venshtain, Christopher Heil...
2024
-
[48]
Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 2, 3, 5
2021
-
[49]
AutoSDF: Shape priors for 3d completion, reconstruction and generation
Paritosh Mittal, Yen-Chi Cheng, Maneesh Singh, and Shub- ham Tulsiani. AutoSDF: Shape priors for 3d completion, reconstruction and generation. InCVPR, 2022. 4
2022
-
[50]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023. 4, 5
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[51]
Renderme-360: a large digital asset library and benchmarks towards high-fidelity head avatars.Advances in Neural Information Processing Systems, 36, 2024
Dongwei Pan, Long Zhuo, Jingtan Piao, Huiwen Luo, Wei Cheng, Yuxin Wang, Siming Fan, Shengqi Liu, Lei Yang, Bo Dai, et al. Renderme-360: a large digital asset library and benchmarks towards high-fidelity head avatars.Advances in Neural Information Processing Systems, 36, 2024. 3
2024
-
[52]
Gaussianavatars: Photorealistic head avatars with rigged 3d gaussians
Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Da- vide Davoli, Simon Giebenhain, and Matthias Nießner. Gaussianavatars: Photorealistic head avatars with rigged 3d gaussians. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20299– 20309, 2024. 2, 6
2024
-
[53]
Learning transferable visual models from natural language supervi- sion
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PMLR, 2021. 6
2021
-
[54]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, 2022. 2
2022
-
[55]
Animating arbitrary objects via deep motion transfer
Aliaksandr Siarohin, St ´ephane Lathuili`ere, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. Animating arbitrary objects via deep motion transfer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2377–2386, 2019. 2, 3
2019
-
[56]
First order motion model for image animation.Advances in neural information processing systems, 32, 2019
Aliaksandr Siarohin, St ´ephane Lathuili`ere, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. First order motion model for image animation.Advances in neural information processing systems, 32, 2019. 3
2019
-
[57]
Motion representations for articulated animation
Aliaksandr Siarohin, Oliver J Woodford, Jian Ren, Menglei Chai, and Sergey Tulyakov. Motion representations for articulated animation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13653–13662, 2021. 2, 3
2021
-
[58]
Meshgpt: Generating triangle meshes with decoder-only transformers
Yawar Siddiqui, Antonio Alliegro, Alexey Artemov, Tatiana Tommasi, Daniele Sirigatti, Vladislav Rosov, Angela Dai, and Matthias Nießner. Meshgpt: Generating triangle meshes with decoder-only transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 4
2024
-
[59]
Pointgrow: Autoregressively learned point cloud generation with self-attention
Yongbin Sun, Yue Wang, Ziwei Liu, Joshua Siegel, and Sanjay Sarma. Pointgrow: Autoregressively learned point cloud generation with self-attention. InThe IEEE Winter Conference on Applications of Computer Vision, pages 61– 70, 2020. 4
2020
-
[60]
Jiapeng Tang, Davide Davoli, Tobias Kirschstein, Liam Schoneveld, and Matthias Niessner. Gaf: Gaussian avatar reconstruction from monocular videos via multi-view diffu- sion.arXiv preprint arXiv:2412.10209, 2024. 3
-
[61]
Felix Taubner, Ruihang Zhang, Mathieu Tuli, and David B Lindell. Cap4d: Creating animatable 4d portrait avatars with morphable multi-view diffusion models.arXiv preprint arXiv:2412.12093, 2024. 3
-
[62]
Real-time radiance fields for single-image portrait view synthesis
Alex Trevithick, Matthew Chan, Michael Stengel, Eric Chan, Chao Liu, Zhiding Yu, Sameh Khamis, Ravi Ramamoorthi, and Koki Nagano. Real-time radiance fields for single-image portrait view synthesis. 2023. 3
2023
-
[63]
Neural discrete representation learning
Aaron Van Den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learning. InAdvances in neural information processing systems,
-
[64]
Attention is all you need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in neural information processing systems, pages 5998–6008, 2017. 3, 4
2017
-
[65]
Progressive disentangled representation learning for fine-grained controllable talking head synthesis
Duomin Wang, Yu Deng, Zixin Yin, Heung-Yeung Shum, and Baoyuan Wang. Progressive disentangled representation learning for fine-grained controllable talking head synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17979–17989, 2023. 3
2023
-
[66]
3d gaussian head avatars with expressive dynamic appearances by compact tensorial representations
Yating Wang, Xuan Wang, Ran Yi, Yanbo Fan, Jichen Hu, Jingcheng Zhu, and Lizhuang Ma. 3d gaussian head avatars with expressive dynamic appearances by compact tensorial representations. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21117–21126, 2025. 3
2025
-
[67]
Aniportrait: Audio-driven synthesis of photorealistic portrait animation
Huawei Wei, Zejun Yang, and Zhisheng Wang. Aniportrait: Audio-driven synthesis of photorealistic portrait animations. arXiv:2403.17694, 2024. 2, 3
-
[68]
Flashavatar: High-fidelity head avatar with efficient gaussian embedding
Jun Xiang, Xuan Gao, Yudong Guo, and Juyong Zhang. Flashavatar: High-fidelity head avatar with efficient gaussian embedding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1802– 1812, 2024. 3, 4
2024
-
[69]
Vfhq: A high-quality dataset and benchmark for video face super-resolution
Liangbin Xie, Xintao Wang, Honglun Zhang, Chao Dong, and Ying Shan. Vfhq: A high-quality dataset and benchmark for video face super-resolution. InThe IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022. 3
2022
-
[70]
X-portrait: Expressive portrait anima- tion with hierarchical motion attention
You Xie, Hongyi Xu, Guoxian Song, Chao Wang, Yichun Shi, and Linjie Luo. X-portrait: Expressive portrait anima- tion with hierarchical motion attention. InACM SIGGRAPH 2024 Conference Papers, pages 1–11, 2024. 2, 3
2024
-
[71]
Mingwang Xu, Hui Li, Qingkun Su, Hanlin Shang, Liwei Zhang, Ce Liu, Jingdong Wang, Yao Yao, and Siyu Zhu. Hallo: Hierarchical audio-driven visual synthesis for portrait image animation.arXiv preprint arXiv:2406.08801, 2024. 2
-
[72]
V ASA-1: Lifelike audio-driven talking faces generated in real time
Sicheng Xu, Guojun Chen, Yu-Xiao Guo, Jiaolong Yang, Chong Li, Zhenyu Zang, Yizhong Zhang, Xin Tong, and Baining Guo. V ASA-1: Lifelike audio-driven talking faces generated in real time. InThe Thirty-eighth Annual Confer- ence on Neural Information Processing Systems, 2024. 3
2024
-
[73]
Avatarmav: Fast 3d head avatar reconstruction using motion-aware neural voxels
Yuelang Xu, Lizhen Wang, Xiaochen Zhao, Hongwen Zhang, and Yebin Liu. Avatarmav: Fast 3d head avatar reconstruction using motion-aware neural voxels. InACM SIGGRAPH 2023 Conference Proceedings, pages 1–10,
2023
-
[74]
3d gaussian parametric head model
Yuelang Xu, Lizhen Wang, Zerong Zheng, Zhaoqi Su, and Yebin Liu. 3d gaussian parametric head model. InEuropean Conference on Computer Vision, pages 129–147. Springer,
-
[75]
Facescape: a large- scale high quality 3d face dataset and detailed riggable 3d face prediction
Haotian Yang, Hao Zhu, Yanru Wang, Mingkai Huang, Qiu Shen, Ruigang Yang, and Xun Cao. Facescape: a large- scale high quality 3d face dataset and detailed riggable 3d face prediction. InProceedings of the ieee/cvf conference on computer vision and pattern recognition, pages 601–610,
-
[76]
Real3d-portrait: One-shot realistic 3d talking portrait synthesis,
Zhenhui Ye, Tianyun Zhong, Yi Ren, Jiaqi Yang, Weichuang Li, Jiawei Huang, Ziyue Jiang, Jinzheng He, Rongjie Huang, Jinglin Liu, et al. Real3d-portrait: One-shot realistic 3d talking portrait synthesis.arXiv preprint arXiv:2401.08503,
-
[77]
Styleheat: One-shot high-resolution editable talking face generation via pre-trained stylegan
Fei Yin, Yong Zhang, Xiaodong Cun, Mingdeng Cao, Yanbo Fan, Xuan Wang, Qingyan Bai, Baoyuan Wu, Jue Wang, and Yujiu Yang. Styleheat: One-shot high-resolution editable talking face generation via pre-trained stylegan. InEuropean conference on computer vision, pages 85–101. Springer,
-
[78]
Nofa: Nerf-based one-shot facial avatar reconstruction
Wangbo Yu, Yanbo Fan, Yong Zhang, Xuan Wang, Fei Yin, Yunpeng Bai, Yan-Pei Cao, Ying Shan, Yang Wu, Zhongqian Sun, et al. Nofa: Nerf-based one-shot facial avatar reconstruction. InACM SIGGRAPH 2023 Conference Proceedings, pages 1–12, 2023. 3
2023
-
[79]
Zhixuan Yu, Ziqian Bai, Abhimitra Meka, Feitong Tan, Qiangeng Xu, Rohit Pandey, Sean Fanello, Hyun Soo Park, and Yinda Zhang. One2avatar: Generative implicit head avatar for few-shot user adaptation.arXiv preprint arXiv:2402.11909, 2024. 3
-
[80]
Few-shot adversarial learning of realistic neural talking head models
Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, and Victor Lempitsky. Few-shot adversarial learning of realistic neural talking head models. InProceedings of the IEEE/CVF international conference on computer vision, pages 9459– 9468, 2019. 2, 3
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.