Recognition: unknown
MUA: Mobile Ultra-detailed Animatable Avatars
Pith reviewed 2026-05-10 04:41 UTC · model grok-4.3
The pith
Wavelet-guided blendshapes distill high-fidelity avatar details into a compact form that runs real-time on mobile VR headsets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By coupling multi-level wavelet spectral decomposition with low-rank structural factorization in texture space, our method achieves up to 2000X lower computational cost and a 10X smaller model size than the original high-quality teacher avatar model, while preserving visually plausible dynamics and appearance details closely resembling those of the teacher model. The representation, called Wavelet-guided Multi-level Spatial Factorized Blendshapes, runs at over 180 FPS on desktop hardware and achieves native real-time performance at 24 FPS on a standalone Meta Quest 3.
What carries the argument
Wavelet-guided Multi-level Spatial Factorized Blendshapes, which applies multi-level wavelet decomposition to avatar textures and pairs it with low-rank factorization to encode dynamic geometry and appearance in a compact form.
If this is right
- Outperforms existing methods designed for mobile platforms in rendering quality while matching or exceeding most server-only approaches.
- Enables over 180 FPS on desktop PCs and native 24 FPS on standalone devices such as the Meta Quest 3.
- Makes high-fidelity full-body avatars practical for immersive VR and AR applications without requiring server-class GPUs.
- Reduces model size by roughly 10X while keeping visually plausible motion and appearance.
Where Pith is reading between the lines
- The same distillation pattern could let other heavy 3D models run on phones or headsets by moving computation into a compact spectral form.
- Real-time on-device performance removes the need for constant cloud streaming, which could expand personalized avatar use in consumer games and social VR.
- If the wavelet levels can be adjusted dynamically, the method might support graceful quality scaling based on available battery or bandwidth.
Load-bearing premise
The distillation pipeline transfers the motion-aware clothing dynamics and fine appearance details from the teacher model without introducing noticeable artifacts or fidelity loss at the reduced resolution and compute budget.
What would settle it
A direct visual comparison on Meta Quest 3 hardware between the distilled model and the original teacher at 24 FPS that reveals missing clothing folds, blurred textures, or new artifacts.
Figures
read the original abstract
Building photorealistic, animatable full-body digital humans remains a longstanding challenge in computer graphics and vision. Recent advances in animatable avatar modeling have largely progressed along two directions: improving the fidelity of dynamic geometry and appearance, or reducing computational complexity to enable deployment on resource-constrained platforms, e.g., VR headsets. However, existing approaches fail to achieve both goals simultaneously: Ultra-high-fidelity avatars typically require substantial computation on server-class GPUs, whereas lightweight avatars often suffer from limited surface dynamics, reduced appearance details, and noticeable artifacts. To bridge this gap, we propose a novel animatable avatar representation, termed Wavelet-guided Multi-level Spatial Factorized Blendshapes, and a corresponding distillation pipeline that transfers motion-aware clothing dynamics and fine-grained appearance details from a pre-trained ultra-high-quality avatar model into a compact, efficient representation. By coupling multi-level wavelet spectral decomposition with low-rank structural factorization in texture space, our method achieves up to 2000X lower computational cost and a 10X smaller model size than the original high-quality teacher avatar model, while preserving visually plausible dynamics and appearance details closely resemble those of the teacher model. Extensive comparisons with state-of-the-art methods show that our approach significantly outperforms existing avatar approaches designed for mobile settings and achieves comparable or superior rendering quality to most approaches that can only run on servers. Importantly, our representation substantially improves the practicality of high-fidelity avatars for immersive applications, achieving over 180 FPS on a desktop PC and real-time native on-device performance at 24 FPS on a standalone Meta Quest 3.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Wavelet-guided Multi-level Spatial Factorized Blendshapes as a compact animatable avatar representation together with a distillation pipeline that transfers motion-aware clothing dynamics and fine-grained appearance from a high-quality teacher model. It claims up to 2000X lower computational cost and 10X smaller model size than the teacher while preserving visually plausible dynamics, enabling >180 FPS on desktop and real-time 24 FPS native performance on Meta Quest 3, and outperforming prior mobile avatar methods.
Significance. If the efficiency and fidelity claims hold with rigorous validation, the work would meaningfully advance practical deployment of high-detail full-body avatars on consumer VR/AR hardware by combining spectral decomposition with low-rank factorization in a distillation setting. The approach addresses a clear gap between server-only ultra-fidelity models and lightweight but low-dynamic alternatives.
major comments (2)
- [Abstract and §4] Abstract and §4 (Experiments): The headline claims of 2000X lower cost and 10X smaller size are stated without accompanying quantitative tables, baseline comparisons, error bars, or evaluation protocol details. No per-region or frequency-band metrics (e.g., high-frequency PSNR on clothing wrinkles, temporal coherence scores) are reported to verify that the distillation retains motion-dependent dynamics rather than smoothing them.
- [§3] §3 (Method): The Wavelet-guided Multi-level Spatial Factorized Blendshapes representation depends on free parameters (number of wavelet levels, low-rank factorization rank) whose effect on preserving high-frequency temporal components of clothing deformation is not analyzed; truncation or aliasing in the wavelet bands during factorization could produce the very artifacts the method aims to avoid, yet no sensitivity study or spectral error analysis is provided.
minor comments (1)
- [Abstract] Abstract: The sentence 'appearance details closely resemble those of the teacher model' is grammatically incomplete and should be rephrased for clarity.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major point below and will revise the manuscript accordingly to strengthen the quantitative validation and analysis.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experiments): The headline claims of 2000X lower cost and 10X smaller size are stated without accompanying quantitative tables, baseline comparisons, error bars, or evaluation protocol details. No per-region or frequency-band metrics (e.g., high-frequency PSNR on clothing wrinkles, temporal coherence scores) are reported to verify that the distillation retains motion-dependent dynamics rather than smoothing them.
Authors: We agree that the efficiency claims require more explicit quantitative backing. In the revised manuscript we will add a dedicated table in §4 reporting measured computational cost (FLOPs and wall-clock inference time on desktop and Quest 3 hardware), model size (parameters and MB), and direct comparisons against the teacher model as well as prior mobile and server-based baselines. Error bars from repeated runs will be included where relevant, and the evaluation protocol (sequences, hardware, measurement methodology) will be fully specified. To confirm retention of motion-dependent dynamics we will additionally report per-region metrics on clothing areas together with frequency-band PSNR and temporal coherence scores computed over animation sequences. revision: yes
-
Referee: [§3] §3 (Method): The Wavelet-guided Multi-level Spatial Factorized Blendshapes representation depends on free parameters (number of wavelet levels, low-rank factorization rank) whose effect on preserving high-frequency temporal components of clothing deformation is not analyzed; truncation or aliasing in the wavelet bands during factorization could produce the very artifacts the method aims to avoid, yet no sensitivity study or spectral error analysis is provided.
Authors: The number of wavelet levels and factorization rank were chosen via preliminary experiments to balance compactness and fidelity. We acknowledge that a dedicated sensitivity study is missing. In the revision we will add an ablation study (in §4 or an appendix) that systematically varies both parameters and reports their effect on high-frequency detail preservation using spectral error metrics, temporal coherence, and visual comparisons. This analysis will also address potential truncation or aliasing artifacts. revision: yes
Circularity Check
No circularity in derivation; novel representation and distillation are independent
full rationale
The paper proposes a new animatable avatar representation (Wavelet-guided Multi-level Spatial Factorized Blendshapes) together with a distillation pipeline from a pre-trained teacher model. Performance claims (2000X lower cost, 10X smaller size, real-time FPS) are presented as empirical outcomes of this architecture and transfer process rather than quantities defined by the same fitted parameters or reduced by construction to prior self-citations. No equations or steps in the provided text equate the claimed results to inputs via self-definition, fitted-input renaming, or load-bearing self-citation chains. The derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- number of wavelet decomposition levels
- low-rank factorization rank
axioms (1)
- domain assumption Pre-trained ultra-high-quality teacher model supplies accurate motion-aware clothing dynamics and fine appearance details that can be distilled
invented entities (1)
-
Wavelet-guided Multi-level Spatial Factorized Blendshapes
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Exploring the design space of immersive urban analytics,
Z. Chen, Y. Wang, T. Sun, X. Gao, W. Chen, Z. Pan, H. Qu, and Y. Wu, “Exploring the design space of immersive urban analytics,” Visual Informatics, vol. 1, no. 2, pp. 132–142, 2017
2017
-
[2]
Educational twin: the influence of artificial xr expert duplicates on future learning,
C. Sayffaerth, “Educational twin: the influence of artificial xr expert duplicates on future learning,”arXiv preprint arXiv:2504.13896, 2025
-
[3]
Nerf: Representing scenes as neural radiance fields for view synthesis,
B. Mildenhall, P . P . Srinivasan, M. Tancik, J. T. Barron, R. Ramamoor- thi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” inEur. Conf. Comput. Vis., 2020
2020
-
[4]
3d gaussian splatting for real-time radiance field rendering,
B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,”ACM T rans. Graph., vol. 42, no. 4, pp. 1–14, 2023
2023
-
[5]
Neus2: Fast learning of neural implicit surfaces for multi- view reconstruction,
Y. Wang, Q. Han, M. Habermann, K. Daniilidis, C. Theobalt, and L. Liu, “Neus2: Fast learning of neural implicit surfaces for multi- view reconstruction,” inInt. Conf. Comput. Vis., 2023
2023
-
[6]
Animatable gaussians: Learning pose-dependent gaussian maps for high-fidelity human avatar modeling,
Z. Li, Z. Zheng, L. Wang, and Y. Liu, “Animatable gaussians: Learning pose-dependent gaussian maps for high-fidelity human avatar modeling,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 19 711–19 722
2024
-
[7]
Hdhumans: A hybrid approach for high-fidelity digital humans,
M. Habermann, L. Liu, W. Xu, G. Pons-Moll, M. Zollhoefer, and C. Theobalt, “Hdhumans: A hybrid approach for high-fidelity digital humans,”Proceedings of the ACM on Computer Graphics and Interactive T echniques, vol. 6, no. 3, pp. 1–23, 2023. 14
2023
-
[8]
Tava: Template-free animatable volumetric actors,
R. Li, J. Tanke, M. Vo, M. Zollhofer, J. Gall, A. Kanazawa, and C. Lassner, “Tava: Template-free animatable volumetric actors,” 2022
2022
-
[9]
Arah: Animatable volume rendering of articulated human sdfs,
S. Wang, K. Schwarz, A. Geiger, and S. Tang, “Arah: Animatable volume rendering of articulated human sdfs,” inEur. Conf. Comput. Vis., 2022
2022
-
[10]
Ash: Animatable gaussian splats for efficient and photoreal human rendering,
H. Pang, H. Zhu, A. Kortylewski, C. Theobalt, and M. Habermann, “Ash: Animatable gaussian splats for efficient and photoreal human rendering,” inIEEE Conf. Comput. Vis. Pattern Recog., June 2024, pp. 1165–1175
2024
-
[11]
Uma: Ultra- detailed human avatars via multi-level surface alignment,
H. Zhu, G. Sun, C. Theobalt, and M. Habermann, “Uma: Ultra- detailed human avatars via multi-level surface alignment,”arXiv preprint arXiv:2506.01802, 2025
-
[12]
Cotracker: It is better to track together,
N. Karaev, I. Rocco, B. Graham, N. Neverova, A. Vedaldi, and C. Rupprecht, “Cotracker: It is better to track together,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 18–35
2024
-
[13]
Squeezeme: Mobile-ready distillation of gaussian full-body avatars,
F. Iandola, S. Pidhorskyi, I. Santesteban, D. Gupta, A. Pahuja, N. Bartolovic, F. Yu, E. Garbin, T. Simon, and S. Saito, “Squeezeme: Mobile-ready distillation of gaussian full-body avatars,” inProceed- ings of the Special Interest Group on Computer Graphics and Interactive T echniques Conference Conference Papers, 2025, pp. 1–11
2025
-
[14]
Taoa- vatar: Real-time lifelike full-body talking avatars for augmented reality via 3d gaussian splatting,
J. Chen, J. Hu, G. Wang, Z. Jiang, T. Zhou, Z. Chen, and C. Lv, “Taoa- vatar: Real-time lifelike full-body talking avatars for augmented reality via 3d gaussian splatting,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 10 723–10 734
2025
-
[15]
Expressive body capture: 3d hands, face, and body from a single image,
G. Pavlakos, V . Choutas, N. Ghorbani, T. Bolkart, A. A. A. Osman, D. Tzionas, and M. J. Black, “Expressive body capture: 3d hands, face, and body from a single image,” inIEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 10 975–10 985
2019
-
[16]
Tensorf: Tensorial radiance fields,
A. Chen, Z. Xu, A. Geiger, J. Yu, and H. Su, “Tensorf: Tensorial radiance fields,” inEuropean conference on computer vision. Springer, 2022, pp. 333–350
2022
-
[17]
Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans,
S. Peng, Y. Zhang, Y. Xu, Q. Wang, Q. Shuai, H. Bao, and X. Zhou, “Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans,” inIEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 9054–9063
2021
-
[18]
Mixture of volumetric primitives for efficient neural rendering,
S. Lombardi, T. Simon, G. Schwartz, M. Zollhofer, Y. Sheikh, and J. M. Saragih, “Mixture of volumetric primitives for efficient neural rendering,”ACM T rans. Graph., vol. 40, no. 4, pp. 59:1–59:13, 2021
2021
-
[19]
Learning compositional radiance fields of dynamic human heads,
Z. Wang, T. Bagautdinov, S. Lombardi, T. Simon, J. Saragih, J. Hodgins, and M. Zollhofer, “Learning compositional radiance fields of dynamic human heads,” 2020
2020
-
[20]
HumanNeRF: Free-viewpoint ren- dering of moving people from monocular video,
C.-Y. Weng, B. Curless, P . P . Srinivasan, J. T. Barron, and I. Kemelmacher-Shlizerman, “HumanNeRF: Free-viewpoint ren- dering of moving people from monocular video,” inIEEE Conf. Comput. Vis. Pattern Recog., June 2022, pp. 16 210–16 220
2022
-
[21]
Humanrf: High-fidelity neural radiance fields for humans in motion,
M. I¸ sık, M. Runz, M. Georgopoulos, T. Khakhulin, J. Starck, L. Agapito, and M. Niessner, “Humanrf: High-fidelity neural radiance fields for humans in motion,”ACM T rans. Graph., vol. 42, no. 4, pp. 1–12, 2023
2023
-
[22]
Repre- senting long volumetric video with temporal gaussian hierarchy,
Z. Xu, Y. Xu, Z. Yu, S. Peng, J. Sun, H. Bao, and X. Zhou, “Repre- senting long volumetric video with temporal gaussian hierarchy,” ACM T ransactions on Graphics (TOG), vol. 43, no. 6, pp. 1–18, 2024
2024
-
[23]
Reperformer: Immersive human-centric volumetric videos from playback to photoreal reperformance,
Y. Jiang, Z. Shen, C. Guo, Y. Hong, Z. Su, Y. Zhang, M. Habermann, and L. Xu, “Reperformer: Immersive human-centric volumetric videos from playback to photoreal reperformance,”arXiv preprint arXiv:2503.12242, 2025
-
[24]
Modeling clothing as a separate layer for an animatable human avatar,
D. Xiang, F. Prada, T. Bagautdinov, W. Xu, Y. Dong, H. Wen, J. Hodgins, and C. Wu, “Modeling clothing as a separate layer for an animatable human avatar,”ACM T rans. Graph., vol. 40, no. 6, pp. 1–15, 2021
2021
-
[25]
Detailed human avatars from monocular video,
T. Alldieck, M. Magnor, W. Xu, C. Theobalt, and G. Pons-Moll, “Detailed human avatars from monocular video,” inInternational Conference on 3D Vision, Sep. 2018, pp. 98–109
2018
-
[26]
Learning to reconstruct people in clothing from a single RGB camera,
T. Alldieck, M. Magnor, B. L. Bhatnagar, C. Theobalt, and G. Pons- Moll, “Learning to reconstruct people in clothing from a single RGB camera,” inIEEE Conf. Comput. Vis. Pattern Recog., Jun. 2019, pp. 1175–1186
2019
-
[27]
Deepcap: Monocular human performance capture using weak supervision,
M. Habermann, W. Xu, M. Zollhofer, G. Pons-Moll, and C. Theobalt, “Deepcap: Monocular human performance capture using weak supervision,” inIEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 5052–5063
2020
-
[28]
Livecap: Real-time human performance capture from monocular video,
M. Habermann, W. Xu, M. Zollhoefer, G. Pons-Moll, and C. Theobalt, “Livecap: Real-time human performance capture from monocular video,”ACM T ransactions On Graphics (TOG), vol. 38, no. 2, pp. 1–17, 2019
2019
-
[29]
ECON: Explicit Clothed humans Optimized via Normal integration,
Y. Xiu, J. Yang, X. Cao, D. Tzionas, and M. J. Black, “ECON: Explicit Clothed humans Optimized via Normal integration,” inIEEE Conf. Comput. Vis. Pattern Recog., June 2023
2023
-
[30]
Sifu: Side-view conditioned im- plicit function for real-world usable clothed human reconstruction,
Z. Zhang, Z. Yang, and Y. Yang, “Sifu: Side-view conditioned im- plicit function for real-world usable clothed human reconstruction,” inIEEE Conf. Comput. Vis. Pattern Recog., 2024, pp. 9936–9947
2024
-
[31]
Gstar: Gaussian surface tracking and reconstruction,
C. Zheng, L. Xue, J. Zarate, and J. Song, “Gstar: Gaussian surface tracking and reconstruction,”arXiv preprint arXiv:2501.10283, 2025
-
[32]
Registering explicit to implicit: Towards high-fidelity garment mesh reconstruction from single images,
H. Zhu, L. Qiu, Y. Qiu, and X. Han, “Registering explicit to implicit: Towards high-fidelity garment mesh reconstruction from single images,” inIEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 3845– 3854
2022
-
[33]
Neural human performer: Learning generalizable radiance fields for human performance rendering,
Y. Kwon, D. Kim, D. Ceylan, and H. Fuchs, “Neural human performer: Learning generalizable radiance fields for human performance rendering,”Adv. Neural Inform. Process. Syst., 2021
2021
-
[34]
Ibrnet: Learning multi-view image-based rendering,
Q. Wang, Z. Wang, K. Genova, P . Srinivasan, H. Zhou, J. T. Barron, R. Martin-Brualla, N. Snavely, and T. Funkhouser, “Ibrnet: Learning multi-view image-based rendering,” inIEEE Conf. Comput. Vis. Pattern Recog., 2021
2021
-
[35]
Drivable volumetric avatars using texel-aligned features,
E. Remelli, T. M. Bagautdinov, S. Saito, C. Wu, T. Simon, S. Wei, K. Guo, Z. Cao, F. Prada, J. M. Saragih, and Y. Sheikh, “Drivable volumetric avatars using texel-aligned features,” inSIGGRAPH (Conference Paper T rack), 2022, pp. 56:1–56:9
2022
-
[36]
Holoported characters: Real-time free-viewpoint rendering of humans from sparse rgb cameras,
A. Shetty, M. Habermann, G. Sun, D. Luvizon, V . Golyanik, and C. Theobalt, “Holoported characters: Real-time free-viewpoint rendering of humans from sparse rgb cameras,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 1206–1215
2024
-
[37]
Metacap: Meta-learning priors from multi-view imagery for sparse- view human performance capture and rendering,
G. Sun, R. Dabral, P . Fua, C. Theobalt, and M. Habermann, “Metacap: Meta-learning priors from multi-view imagery for sparse- view human performance capture and rendering,” inECCV, 2024
2024
-
[38]
Real-time free-view human rendering from sparse-view rgb videos using double unprojected textures,
G. Sun, R. Dabral, H. Zhu, P . Fua, C. Theobalt, and M. Habermann, “Real-time free-view human rendering from sparse-view rgb videos using double unprojected textures,” June 2025
2025
-
[39]
Giga: Generalizable sparse image-driven gaussian humans,
A. Zubekhin, H. Zhu, P . Gotardo, T. Beeler, M. Habermann, and C. Theobalt, “Giga: Generalizable sparse image-driven gaussian humans,”arXiv, 2025
2025
-
[40]
Blender,
Blender Foundation, “Blender,” 2025. [Online]. Available: https://www.blender.org
2025
-
[41]
Video- based reconstruction of animatable human characters,
C. Stoll, J. Gall, E. De Aguiar, S. Thrun, and C. Theobalt, “Video- based reconstruction of animatable human characters,”TOG, vol. 29, no. 6, pp. 1–10, 2010
2010
-
[42]
Drape: Dressing any person,
P . Guan, L. Reiss, D. A. Hirshberg, A. Weiss, and M. J. Black, “Drape: Dressing any person,”TOG, vol. 31, no. 4, pp. 1–10, 2012
2012
-
[43]
Video-based characters: creating new human performances from a multi-view video database,
F. Xu, Y. Liu, C. Stoll, J. Tompkin, G. Bharaj, Q. Dai, H.-P . Seidel, J. Kautz, and C. Theobalt, “Video-based characters: creating new human performances from a multi-view video database,” inACM SIGGRAPH 2011 papers, 2011, pp. 1–10
2011
-
[44]
4d video textures for interactive character appearance,
D. Casas, M. Volino, J. Collomosse, and A. Hilton, “4d video textures for interactive character appearance,”Comput. Graph. Forum, vol. 33, no. 2, p. 371–380, May 2014
2014
-
[45]
Textured neural avatars,
A. Shysheya, E. Zakharov, K.-A. Aliev, R. Bashirov, E. Burkov, K. Iskakov, A. Ivakhnenko, Y. Malkov, I. Pasechnik, D. Ulyanov et al., “Textured neural avatars,” inIEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 2387–2397
2019
-
[46]
Driving-signal aware full-body avatars,
T. Bagautdinov, C. Wu, T. Simon, F. Prada, T. Shiratori, S.-E. Wei, W. Xu, Y. Sheikh, and J. Saragih, “Driving-signal aware full-body avatars,”ACM T ransactions on Graphics (TOG), vol. 40, no. 4, pp. 1–17, 2021
2021
-
[47]
Dressing avatars: Deep photorealistic appearance for physically simulated clothing,
D. Xiang, T. Bagautdinov, T. Stuyck, F. Prada, J. Romero, W. Xu, S. Saito, J. Guo, B. Smith, T. Shiratoriet al., “Dressing avatars: Deep photorealistic appearance for physically simulated clothing,”ACM T rans. Graph., vol. 41, no. 6, pp. 1–15, 2022
2022
-
[48]
Real-time deep dynamic characters,
M. Habermann, L. Liu, W. Xu, M. Zollhoefer, G. Pons-Moll, and C. Theobalt, “Real-time deep dynamic characters,”ACM T rans. Graph., vol. 40, no. 4, aug 2021
2021
-
[49]
Embedded deformation for shape manipulation,
R. W. Sumner, J. Schmid, and M. Pauly, “Embedded deformation for shape manipulation,”ACM T rans. Graph., vol. 26, no. 3, p. 80–es, jul 2007
2007
-
[50]
Meshavatar: Learning high-quality triangular human avatars from multi-view videos,
Y. Chen, Z. Zheng, Z. Li, C. Xu, and Y. Liu, “Meshavatar: Learning high-quality triangular human avatars from multi-view videos,” in Eur. Conf. Comput. Vis.Springer, 2024, pp. 250–269
2024
-
[51]
Neus: learning neural implicit surfaces by volume rendering for multi-view reconstruction,
P . Wang, L. Liu, Y. Liu, C. Theobalt, T. Komura, and W. Wang, “Neus: learning neural implicit surfaces by volume rendering for multi-view reconstruction,” inProceedings of the 35th International Conference on Neural Information Processing Systems, 2021, pp. 27 171– 27 183. 15
2021
-
[52]
SMPL: A skinned multi-person linear model,
M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black, “SMPL: A skinned multi-person linear model,”ACM T rans. Graphics (Proc. SIGGRAPH Asia), vol. 34, no. 6, pp. 248:1–248:16, Oct 2015
2015
-
[53]
Star: Sparse trained articulated human body regressor,
A. Osman, T. Bolkart, and M. J. Black, “Star: Sparse trained articulated human body regressor,” inEur. Conf. Comput. Vis., 2020, pp. 598–613
2020
-
[54]
Total capture: A 3d deformation model for tracking faces, hands, and bodies,
H. Joo, T. Simon, and Y. Sheikh, “Total capture: A 3d deformation model for tracking faces, hands, and bodies,” inIEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 8320–8329
2018
-
[55]
Neural actor: Neural free-view synthesis of human actors with pose control,
L. Liu, M. Habermann, V . Rudnev, K. Sarkar, J. Gu, and C. Theobalt, “Neural actor: Neural free-view synthesis of human actors with pose control,”ACM T rans. Graph.(ACM SIGGRAPH Asia), 2021
2021
-
[56]
Animatable neural radiance fields for modeling dynamic human bodies,
S. Peng, J. Dong, Q. Wang, S. Zhang, Q. Shuai, X. Zhou, and H. Bao, “Animatable neural radiance fields for modeling dynamic human bodies,” inInt. Conf. Comput. Vis., 2021, pp. 14 314–14 323
2021
-
[57]
H-nerf: Neural radiance fields for rendering and temporal reconstruction of humans in motion,
H. Xu, T. Alldieck, and C. Sminchisescu, “H-nerf: Neural radiance fields for rendering and temporal reconstruction of humans in motion,”Adv. Neural Inform. Process. Syst., vol. 34, pp. 14 955–14 966, 2021
2021
-
[58]
Neural novel actor: Learning a generalized animatable neural representation for human actors,
Q. Gao, Y. Wang, L. Liu, L. Liu, C. Theobalt, and B. Chen, “Neural novel actor: Learning a generalized animatable neural representation for human actors,”IEEE T rans. Vis. Comput. Graph., 2023
2023
-
[59]
Avatarrex: Real- time expressive full-body avatars,
Z. Zheng, X. Zhao, H. Zhang, B. Liu, and Y. Liu, “Avatarrex: Real- time expressive full-body avatars,”ACM T rans. Graph., vol. 42, no. 4, 2023
2023
-
[60]
Deliffas: Deformable light fields for fast avatar synthesis,
Y. Kwon, L. Liu, H. Fuchs, M. Habermann, and C. Theobalt, “Deliffas: Deformable light fields for fast avatar synthesis,”Adv. Neural Inform. Process. Syst., 2023
2023
-
[61]
H. Zhu, F. Zhan, C. Theobalt, and M. Habermann, “Trihuman: A real-time and controllable tri-plane representation for de- tailed human geometry and appearance synthesis,”arXiv preprint arXiv:2312.05161, 2023
-
[62]
Efficient geometry-aware 3D generative adversarial networks,
E. R. Chan, C. Z. Lin, M. A. Chan, K. Nagano, B. Pan, S. D. Mello, O. Gallo, L. Guibas, J. Tremblay, S. Khamis, T. Karras, and G. Wetzstein, “Efficient geometry-aware 3D generative adversarial networks,” inCVPR, 2022
2022
-
[63]
Scale: Modeling clothed humans with a surface codec of articulated local elements,
Q. Ma, S. Saito, J. Yang, S. Tang, and M. J. Black, “Scale: Modeling clothed humans with a surface codec of articulated local elements,” inCVPR, 2021, pp. 16 082–16 093
2021
-
[64]
The power of points for modeling humans in clothing,
Q. Ma, J. Yang, S. Tang, and M. J. Black, “The power of points for modeling humans in clothing,” inICCV, 2021, pp. 10 974–10 984
2021
-
[65]
Learning implicit templates for point-based clothed human modeling,
S. Lin, H. Zhang, Z. Zheng, R. Shao, and Y. Liu, “Learning implicit templates for point-based clothed human modeling,” inECCV. Springer, 2022, pp. 210–228
2022
-
[66]
Smpl: A skinned multi-person linear model,
M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black, “Smpl: A skinned multi-person linear model,”ACM T ransactions on Graphics, vol. 34, no. 6, 2015
2015
-
[67]
Gart: Gaussian articulated template models,
J. Lei, Y. Wang, G. Pavlakos, L. Liu, and K. Daniilidis, “Gart: Gaussian articulated template models,” inCVPR, 2024
2024
-
[68]
3dgs- avatar: Animatable avatars via deformable 3d gaussian splatting,
Z. Qian, S. Wang, M. Mihajlovic, A. Geiger, and S. Tang, “3dgs- avatar: Animatable avatars via deformable 3d gaussian splatting,” inCVPR, 2024
2024
-
[69]
Gauhuman: Articulated gaussian splatting from monocular human videos,
S. Hu and Z. Liu, “Gauhuman: Articulated gaussian splatting from monocular human videos,” inCVPR, 2024
2024
-
[70]
Hugs: Human gaussian splats,
M. Kocabas, J.-H. R. Chang, J. Gabriel, O. Tuzel, and A. Ranjan, “Hugs: Human gaussian splats,” inCVPR, 2024
2024
-
[71]
Gaussianavatar: Towards realistic human avatar modeling from a single video via animatable 3d gaussians,
L. Hu, H. Zhang, Y. Zhang, B. Zhou, B. Liu, S. Zhang, and L. Nie, “Gaussianavatar: Towards realistic human avatar modeling from a single video via animatable 3d gaussians,” inCVPR, 2024
2024
-
[72]
Pixel codec avatars,
S. Ma, T. Simon, J. Saragih, D. Wang, Y. Li, F. De la Torre, and Y. Sheikh, “Pixel codec avatars,” inCVPR, June 2021, pp. 64–73
2021
-
[73]
Morf: Mobile realistic fullbody avatars from a monocular video,
R. Bashirov, A. Larionov, E. Ustinova, M. Sidorenko, D. Svitov, I. Zakharkin, and V . Lempitsky, “Morf: Mobile realistic fullbody avatars from a monocular video,” inCVPR, 2024, pp. 3545–3555
2024
-
[74]
SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting,
Z. Shao, Z. Wang, Z. Li, D. Wang, X. Lin, Y. Zhang, M. Fan, and Z. Wang, “SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting,” inComputer Vision and Pattern Recognition (CVPR), 2024
2024
-
[75]
The Captury,
TheCaptury, “The Captury,” http://www.thecaptury.com/, 2020
2020
-
[76]
Skinning with dual quaternions,
L. Kavan, S. Collins, J. Žára, and C. O’Sullivan, “Skinning with dual quaternions,” inProceedings of the 2007 symposium on Interactive 3D graphics and games, 2007, pp. 39–46
2007
-
[77]
U-net: Convolutional networks for biomedical image segmentation,
O. Ronneberger, P . Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inMedical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, Oct 5-9, 2015, Proceedings, Part III 18. Springer, 2015, pp. 234–241
2015
-
[78]
Image inpainting via generative multi-column convolutional neural networks,
Y. Wang, X. Tao, X. Qi, X. Shen, and J. Jia, “Image inpainting via generative multi-column convolutional neural networks,”Advances in neural information processing systems, vol. 31, 2018
2018
-
[79]
Principal components analysis (pca),
A. Ma´ ckiewicz and W. Ratajczak, “Principal components analysis (pca),”Computers & Geosciences, vol. 19, no. 3, pp. 303–342, 1993
1993
-
[80]
Lora: Low-rank adaptation of large language models
E. J. Hu, Y. Shen, P . Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.”Iclr, vol. 1, no. 2, p. 3, 2022
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.