Generator-Refiner-Examiner: A Tri-Module Data Augmentation Framework for 3D Human Avatar Learning from Monocular Videos
Pith reviewed 2026-05-25 05:06 UTC · model grok-4.3
The pith
A tri-module data augmentation system improves 3D human avatar reconstruction from monocular videos with limited frames.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TrioMan augments limited monocular video data for 3D avatar learning through three modules: the Generator imposes Gaussian perturbations on pose and camera to produce diverse unseen samples; the Refiner applies one-step diffusion conditioned on texture and geometry cues to raise sample quality; the Examiner uses dual-branch attention-based similarity evaluation to retain only subject-consistent examples. This process supplies additional useful training signal that improves reconstruction when real frames are scarce.
What carries the argument
The tri-module Generator-Refiner-Examiner pipeline, where Generator perturbs pose and camera, Refiner performs guided one-step diffusion, and Examiner applies dual-branch attention similarity filtering.
If this is right
- Augmented samples enable capture of fine-grained details that standard per-subject optimization misses under data limits.
- The framework outperforms existing methods on the X-Humans and NeuMan benchmarks.
- Subject-consistent extra data reduces dependence on generic human priors for avatar quality.
Where Pith is reading between the lines
- The same generate-refine-examine loop could apply to other sparse-view 3D reconstruction problems beyond human avatars.
- If the examiner's attention scoring proves reliable, similar filtering might improve synthetic data use in related vision tasks.
Load-bearing premise
That the perturbed, diffused, and filtered samples remain subject-consistent and supply useful training signal beyond the original limited frames.
What would settle it
Running the full TrioMan pipeline on X-Humans or NeuMan videos with few frames and finding no measurable gain in avatar reconstruction metrics compared with training on the original frames alone.
Figures
read the original abstract
This paper addresses the challenge of reconstructing photorealistic and animatable 3D human avatars from monocular videos. While existing methods rely on combining per-subject optimization with generic human priors, they often fail to capture fine-grained details when training frames are limited. To mitigate this data scarcity, we propose TrioMan, a systematic tri-module framework for augmented 3D avatar learning. Our approach comprises three synergistic components. The Generator creates diverse unseen samples by imposing Gaussian perturbations on pose and camera. The Refiner improves the quality of generated data through one-step diffusion guided by texture and geometry cues. The Examiner selects subject-consistent samples using a dual-branch attention-based similarity evaluation. Experiments on the X-Humans and NeuMan benchmarks show that TrioMan outperforms state-of-the-art methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes TrioMan, a tri-module data augmentation framework for reconstructing photorealistic and animatable 3D human avatars from monocular videos with limited frames. The Generator creates diverse samples via Gaussian perturbations on pose and camera parameters; the Refiner enhances them using one-step diffusion conditioned on texture and geometry cues; the Examiner filters for subject consistency with a dual-branch attention-based similarity metric. The central claim is that this pipeline yields useful additional training signal and outperforms prior methods on the X-Humans and NeuMan benchmarks.
Significance. If the experimental claims hold, the framework would offer a practical route to mitigate data scarcity in per-subject avatar optimization, potentially improving fine-grained detail capture without requiring additional real captures or heavier reliance on generic human priors.
major comments (1)
- [Abstract / Experiments] Abstract / Experiments section: The claim that 'Experiments on the X-Humans and NeuMan benchmarks show that TrioMan outperforms state-of-the-art methods' is unsupported by any reported metrics, tables, ablation studies, error analysis, or implementation details. This directly undermines assessment of whether the Generator-Refiner-Examiner pipeline produces subject-consistent, high-signal augmentations as assumed.
minor comments (1)
- [Method] The description of the dual-branch attention mechanism in the Examiner and the precise conditioning signals in the Refiner would benefit from explicit algorithmic pseudocode or equations to support reproducibility.
Simulated Author's Rebuttal
We thank the referee for the careful review and the identification of this critical issue with the experimental claims. We address the comment below.
read point-by-point responses
-
Referee: [Abstract / Experiments] Abstract / Experiments section: The claim that 'Experiments on the X-Humans and NeuMan benchmarks show that TrioMan outperforms state-of-the-art methods' is unsupported by any reported metrics, tables, ablation studies, error analysis, or implementation details. This directly undermines assessment of whether the Generator-Refiner-Examiner pipeline produces subject-consistent, high-signal augmentations as assumed.
Authors: We agree that the claim in the abstract is currently unsupported in the manuscript. The provided text consists only of the abstract and does not contain any quantitative results, tables, ablations, error analysis, or implementation details. In the revised version we will add a full Experiments section with metrics on X-Humans and NeuMan, direct comparisons to prior methods, module ablations, and implementation details so that the performance claims can be properly evaluated. revision: yes
Circularity Check
No significant circularity
full rationale
The paper presents a tri-module data augmentation framework (Generator-Refiner-Examiner) for 3D avatar learning, with claims resting on empirical benchmark results rather than any mathematical derivation chain. No equations, fitted parameters, self-citations as load-bearing premises, or ansatzes are described in the provided text. The central claim (outperformance on X-Humans and NeuMan) is an experimental outcome, not a quantity that reduces to its own inputs by construction. The method is self-contained against external benchmarks with no internal reduction to tautology.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Timur Bagautdinov, Chenglei Wu, Tomas Simon, Fabián Prada, Takaaki Shiratori, Shih-En Wei, Weipeng Xu, Yaser Sheikh, and Jason Saragih. 2021. Driving-signal aware full-body avatars.ACM Trans. Graph.40, 4, Article 143 (July 2021), 17 pages. doi:10.1145/3450626.3459850
-
[2]
Dongliang Chang, Yifeng Ding, Jiyang Xie, Ayan Kumar Bhunia, and Yi Zhe Song. 2020. The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification.IEEE Transactions on Image ProcessingPP, 99 (2020), 1–1
work page 2020
- [3]
- [4]
-
[5]
Wei Cheng, Ruixiang Chen, Siming Fan, Wanqi Yin, Keyu Chen, Zhongang Cai, Jingbo Wang, Yang Gao, Zhengming Yu, Zhengyu Lin, et al. 2023. Dna-rendering: A diverse neural actor repository for high-fidelity human-centric rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 19982– 19993
work page 2023
-
[6]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929(2020)
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[7]
Yuanxing Duan, Fangyin Wei, Qiyu Dai, Yuhang He, Wenzheng Chen, and Bao- quan Chen. 2024. 4d-rotor gaussian splatting: towards efficient novel view synthesis for dynamic scenes. InACM SIGGRAPH 2024 Conference Papers. 1–11
work page 2024
-
[8]
Yao Feng, Jinlong Yang, Marc Pollefeys, Michael J. Black, and Timo Bolkart
-
[9]
InSIGGRAPH Asia 2022 Conference Papers(Daegu, Republic of Korea)(SA ’22)
Capturing and Animation of Body and Clothing from Monocular Video. InSIGGRAPH Asia 2022 Conference Papers(Daegu, Republic of Korea)(SA ’22). Association for Computing Machinery, New York, NY, USA, Article 45, 9 pages. doi:10.1145/3550469.3555423
-
[10]
Chen Guo, Tianjian Jiang, Xu Chen, Jie Song, and Otmar Hilliges. 2023. Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self- supervised Scene Decomposition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
work page 2023
-
[11]
Chen Guo, Junxuan Li, Yash Kant, Yaser Sheikh, Shunsuke Saito, and Chen Cao
-
[12]
InProceedings of the Computer Vision and Pattern Recognition Conference
Vid2avatar-pro: Authentic avatar from videos in the wild via universal prior. InProceedings of the Computer Vision and Pattern Recognition Conference. 5559–5570
-
[13]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. arXiv:2006.11239 [cs.LG]
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[14]
Hezhen Hu, Zhiwen Fan, Tianhao Wu, Yihan Xi, Seoyoung Lee, Georgios Pavlakos, and Zhangyang Wang. 2024. Expressive Gaussian Human Avatars from Monocular RGB Video. InNeurIPS
work page 2024
-
[15]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. InProceed- ings of the IEEE conference on computer vision and pattern recognition. 7132–7141
work page 2018
-
[16]
Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang, Boyao Zhou, Boning Liu, Sheng- ping Zhang, and Liqiang Nie. 2024. GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
work page 2024
-
[17]
Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang, Boyao Zhou, Boning Liu, Sheng- ping Zhang, and Liqiang Nie. 2024. Gaussianavatar: Towards realistic human avatar modeling from a single video via animatable 3d gaussians. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 634–644
work page 2024
-
[18]
Shoukang Hu, Tao Hu, and Ziwei Liu. 2024. Gauhuman: Articulated gaussian splatting from monocular human videos. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition. 20418–20431
work page 2024
-
[19]
Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, and Xiao- juan Qi. 2024. Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4220–4230
work page 2024
-
[20]
Boyi Jiang, Yang Hong, Hujun Bao, and Juyong Zhang. 2022. Selfrecon: Self reconstruction your digital avatar from monocular video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5605–5615
work page 2022
-
[21]
Tianjian Jiang, Xu Chen, Jie Song, and Otmar Hilliges. 2023. InstantAvatar: Learning Avatars from Monocular Video in 60 Seconds. (June 2023)
work page 2023
-
[22]
Tianjian Jiang, Hsuan-I Ho, Manuel Kaufmann, and Jie Song. 2025. PriorAvatar: Efficient and Robust Avatar Creation from Monocular Video Using Learned Priors. InProceedings of the SIGGRAPH Asia 2025 Conference Papers (SA Conference Papers ’25). Association for Computing Machinery, New York, NY, USA, Article 31, 10 pages. doi:10.1145/3757377.3763978
-
[23]
Wei Jiang, Kwang Moo Yi, Golnoosh Samei, Oncel Tuzel, and Anurag Ranjan
-
[24]
InProceedings of the European conference on computer vision (ECCV)
NeuMan: Neural Human Radiance Field from a Single Video. InProceedings of the European conference on computer vision (ECCV)
-
[25]
Daisheng Jin and Ying He. 2026. MonoCloth: Reconstruction and Animation of Cloth-Decoupled Human Avatars from Monocular Videos. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 5503–5511
work page 2026
-
[26]
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkuehler, and George Drettakis
-
[27]
Graph.42, 4, Article 139 (jul 2023), 14 pages
3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Trans. Graph.42, 4, Article 139 (jul 2023), 14 pages. doi:10.1145/3592433
-
[29]
Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, and Anurag Ranjan. 2024. Hugs: Human gaussian splats. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 505–515
work page 2024
-
[30]
SuBeen Lee, WonJun Moon, Hyun Seok Seong, and Jae-Pil Heo. 2024. Task- oriented channel attention for fine-grained few-shot classification.IEEE Transac- tions on Pattern Analysis and Machine Intelligence(2024)
work page 2024
-
[31]
Jiahui Lei, Yufu Wang, Georgios Pavlakos, Lingjie Liu, and Kostas Daniilidis
-
[32]
InProceedings of the IEEE/CVF conference on computer vision and pattern recognition
Gart: Gaussian articulated template models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 19876–19887
- [33]
- [34]
-
[35]
Zhan Li, Zhang Chen, Zhong Li, and Yi Xu. 2024. Spacetime gaussian feature splatting for real-time dynamic view synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8508–8520
work page 2024
-
[36]
Shanchuan Lin, Anran Wang, and Xiao Yang. 2024. Sdxl-lightning: Progressive adversarial diffusion distillation.arXiv preprint arXiv:2402.13929(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[37]
Youtian Lin, Zuozhuo Dai, Siyu Zhu, and Yao Yao. 2024. Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21136–21145
work page 2024
- [38]
-
[39]
Xinqi Liu and Chenming Wu. 2025. VGA: Reconstructing Vivid 3D Gaussian Avatars from Monocular Videos. InInternational Conference on Computational Visual Media. Springer, 172–193
work page 2025
-
[40]
Xian Liu, Xiaohang Zhan, Jiaxiang Tang, Ying Shan, Gang Zeng, Dahua Lin, Xihui Liu, and Ziwei Liu. 2024. Humangaussian: Text-driven 3d human generation with gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6646–6657
work page 2024
-
[41]
Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. 2023. SMPL: A skinned multi-person linear model. InSeminal Graphics Papers: Pushing the Boundaries, Volume 2. 851–866
work page 2023
-
[42]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. InECCV
work page 2020
-
[43]
Gyeongsik Moon, Takaaki Shiratori, and Shunsuke Saito. 2024. Expressive Whole- Body 3D Gaussian Avatar. InECCV
work page 2024
-
[44]
Gyeongsik Moon, Takaaki Shiratori, and Shunsuke Saito. 2024. Expressive whole- body 3d gaussian avatar. InEuropean Conference on Computer Vision. Springer, 19–35
work page 2024
-
[45]
Arthur Moreau, Jifei Song, Helisa Dhamo, Richard Shaw, Yiren Zhou, and Ed- uardo Pérez-Pellitero. 2024. Human gaussian splatting: Real-time rendering of animatable avatars. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 788–798
work page 2024
-
[46]
Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello, Jaeho Moon, Jihyong Oh, and Munchurl Kim. 2025. Splinegs: Robust motion-adaptive spline for real-time dynamic 3d gaussians from monocular video. InProceedings of the Computer Vision and Pattern Recognition Conference. 26866–26875
work page 2025
-
[47]
Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. 2019. Expressive Body Capture: 3D Hands, Face, and Body from a Single Image. InProceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)
work page 2019
-
[48]
Cheng Peng, Jingxiang Sun, Yushuo Chen, Zhaoqi Su, Zhuo Su, and Yebin Liu
- [49]
-
[50]
Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, and Siyu Tang
-
[51]
3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting. (2024)
work page 2024
- [52]
-
[53]
Lingteng Qiu, Peihao Li, Qi Zuo, Xiaodong Gu, Yuan Dong, Weihao Yuan, Siyu Zhu, Xiaoguang Han, Guanying Chen, and Zilong Dong. 2025. PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images. Conference’17, July 2017, Washington, DC, USA Gangjian Zhang, Jian Shu, Sicheng Yu, Wenhao Shen, Yu Feng, and Hao Wang arXiv preprint arXiv...
- [54]
-
[55]
Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. 2024. Adversarial Diffusion Distillation. InComputer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LXXXVI (Milan, Italy). Springer-Verlag, Berlin, Heidelberg, 87–103. doi:10.1007/978-3- 031-73016-0_6
-
[57]
Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, and Zeyu Wang. 2024. Splattingavatar: Realistic real-time human avatars with mesh-embedded gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1606–1616
work page 2024
-
[58]
Kaiyue Shen, Chen Guo, Manuel Kaufmann, Juan Zarate, Julien Valentin, Jie Song, and Otmar Hilliges. 2023. X-Avatar: Expressive Human Avatars.Computer Vision and Pattern Recognition (CVPR)
work page 2023
- [59]
-
[60]
Geonhee Sim and Gyeongsik Moon. 2025. PERSONA: Personalized Whole-Body 3D Avatar with Pose-Driven Deformations from a Single Image. InProceedings of the IEEE/CVF International Conference on Computer Vision. 12670–12680
work page 2025
-
[61]
Shih-Yang Su, Timur Bagautdinov, and Helge Rhodin. 2023. Npc: Neural point characters from video. InProceedings of the IEEE/CVF International conference on computer vision. 14795–14805
work page 2023
-
[62]
Shih-Yang Su, Frank Yu, Michael Zollhöfer, and Helge Rhodin. 2021. A-nerf: Articulated neural radiance fields for learning human shape, appearance, and pose.Advances in neural information processing systems34 (2021), 12278–12291
work page 2021
-
[63]
David Svitov, Pietro Morerio, Lourdes Agapito, and Alessio Del Bue. 2024. Haha: Highly articulated gaussian human avatars with textured mesh prior. InProceed- ings of the Asian Conference on Computer Vision. 4051–4068
work page 2024
-
[64]
Gusi Te, Xiu Li, Xiao Li, Jinglu Wang, Wei Hu, and Yan Lu. 2022. Neural capture of animatable 3d human from monocular video. InEuropean Conference on Computer Vision. Springer, 275–291
work page 2022
-
[65]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)
work page 2017
-
[66]
Zhou Wang and Alan Conrad Bovik. 2006. Modern image quality assessment. (2006)
work page 2006
-
[67]
Jing Wen, Xiaoming Zhao, Zhongzheng Ren, Alexander G Schwing, and Shenlong Wang. 2024. Gomavatar: Efficient animatable human modeling from monocular video using gaussians-on-mesh. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2059–2069
work page 2024
-
[68]
Chung-Yi Weng, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron, and Ira Kemelmacher-Shlizerman. 2022. HumanNeRF: Free-Viewpoint Rendering of Moving People From Monocular Video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 16210–16220
work page 2022
-
[69]
Chung-Yi Weng, Brian Curless, Pratul P Srinivasan, Jonathan T Barron, and Ira Kemelmacher-Shlizerman. 2022. Humannerf: Free-viewpoint rendering of moving people from monocular video. InProceedings of the IEEE/CVF conference on computer vision and pattern Recognition. 16210–16220
work page 2022
-
[70]
Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 2024. 4d gaussian splatting for real-time dynamic scene rendering. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 20310–20320
work page 2024
-
[71]
Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Gojcic, and Huan Ling. 2025. DIFIX3D+: Improving 3D Reconstructions with Single-Step Diffusion Models. InProceedings of the Computer Vision and Pattern Recognition Conference. 26024–26035
work page 2025
-
[72]
Zhangyang Xiong, Chenghong Li, Kenkun Liu, Hongjie Liao, Jianqiao Hu, Junyi Zhu, Shuliang Ning, Lingteng Qiu, Chongjie Wang, Shijie Wang, et al . 2024. MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19801–19811
work page 2024
-
[73]
Jiawei Xu, Zexin Fan, Jian Yang, and Jin Xie. 2024. Grid4d: 4d decomposed hash encoding for high-fidelity dynamic gaussian splatting.Advances in Neural Information Processing Systems37 (2024), 123787–123811
work page 2024
-
[74]
Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. 2024. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 20331–20341
work page 2024
- [75]
-
[76]
Nanjie Yao, Gangjian Zhang, Wenhao Shen, Jian Shu, Yu Feng, and Hao Wang
- [77]
-
[78]
Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Taesung Park. 2024. One-step diffusion with distribution matching distillation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6613–6623
work page 2024
-
[79]
Heng Yu, Joel Julin, Zoltán Á Milacski, Koichiro Niinuma, and László A Jeni. 2024. Cogs: Controllable gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21624–21633
work page 2024
-
[80]
Wangbo Yu, Jinbo Xing, Li Yuan, Wenbo Hu, Xiaoyu Li, Zhipeng Huang, Xiangjun Gao, Tien-Tsin Wong, Ying Shan, and Yonghong Tian. 2024. Viewcrafter: Taming video diffusion models for high-fidelity novel view synthesis.arXiv preprint arXiv:2409.02048(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[81]
Zhengming Yu, Wei Cheng, Xian Liu, Wayne Wu, and Kwan-Yee Lin. 2023. Mono- human: Animatable human neural field from monocular video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16943– 16953
work page 2023
-
[82]
Gangjian Zhang, Jian Shu, Nanjie Yao, and Hao Wang. 2025. SAT: Supervisor Regularization and Animation Augmentation for Two-process Monocular Tex- ture 3D Human Reconstruction. InProceedings of the 33rd ACM International Conference on Multimedia(Dublin, Ireland)(MM ’25). Association for Computing Machinery, New York, NY, USA, 10563–10572. doi:10.1145/3746...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.