PHAF-Personalized Hand Avatars in a Flash

Akanxit Upadhyay; Anmol Namdev; Green Rosh KS; Meghana Shankar; Pawan Prasad BH

arxiv: 2606.03420 · v1 · pith:LY7VUODHnew · submitted 2026-06-02 · 💻 cs.CV

PHAF-Personalized Hand Avatars in a Flash

Meghana Shankar , Akanxit Upadhyay , Anmol Namdev , Green Rosh KS , Pawan Prasad BH This is my paper

Pith reviewed 2026-06-28 10:48 UTC · model grok-4.3

classification 💻 cs.CV

keywords personalized hand avatarstwo-view synthesistexture generationreal-time renderingAR/VR applicationsparametric hand modelview-based inpaintingmesh alignment

0 comments

The pith

From two hand photos, a system builds personalized avatars with textures ready for real-time edge use in under a minute.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a method to generate photo-realistic personalized hand avatars from only dorsal and palmar views. It replaces slow per-subject optimization with a pipeline that aligns a parametric mesh using semantic guidance, extracts densified textures, and refines them via a view-based inpainting network. The result matches the visual quality of prior techniques while cutting texture generation time by a factor of 30. Avatars support novel viewpoints and standard graphics pipelines for articulation. This speed makes custom hands practical for on-device AR and VR without heavy compute.

Core claim

PHAF produces personalized hand avatars from two images by performing semantic guided mesh alignment, densified texture extraction, and refinement with a view-based inpainting network; the resulting textures achieve visual fidelity comparable to optimization-based methods while reducing generation time by 30 times and supporting real-time deployment on edge devices together with accurate articulations from a parametric hand model.

What carries the argument

Semantic guided mesh alignment together with densified texture extraction and a view-based inpainting network that transfers high-frequency details and ensures continuous appearance across viewpoints.

If this is right

Avatars generalize to novel viewpoints without retraining.
Parametric model ensures accurate joint articulations during animation.
Textures integrate directly with standard graphics engines.
Texture creation drops from minutes to seconds, enabling on-device use.
The pipeline supports practical AR and VR applications that require custom hands.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The two-view input requirement could allow quick capture on consumer phones without multi-camera rigs.
If the inpainting network proves robust across skin tones and lighting, the same structure might adapt to other body parts with similar sparse views.
Real-time edge compatibility suggests the avatars could update live during user interaction rather than requiring offline processing.
Faster texture pipelines reduce the data needed per user, which may lower storage and transmission costs in multi-user virtual environments.

Load-bearing premise

The approach assumes semantic mesh alignment plus inpainting will reliably move high-frequency skin details onto the model and produce smooth results that remain consistent from unseen angles.

What would settle it

Rendering the generated avatar from a side or finger-fold viewpoint where seams, blurring, or mismatched skin details become visible compared with ground-truth photos would show the claim does not hold.

Figures

Figures reproduced from arXiv: 2606.03420 by Akanxit Upadhyay, Anmol Namdev, Green Rosh KS, Meghana Shankar, Pawan Prasad BH.

**Figure 2.** Figure 2: Our PHAF pipeline generates multi-view renders [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Our semantic guided alignment along with densified texture extraction extracts UV texture image from High density [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: We showcase our semantic alignment feature, the [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Extracted UV Texture images of the subjects: The un [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 7.** Figure 7: DinoV2 [27] and ViT [46] heatmaps shows that these metrics attend to user specific identifiers which are crucial for personalised hand avatar Beyond raw scores, a key advantage of transformer-based embeddings is interpretability: attention heatmaps from ViT/DINO models consistently localize discriminative patches and can be visualized to verify that the metric attends to user-specific cues (e.g., tattoos,… view at source ↗

**Figure 6.** Figure 6: Qualitative results for the evaluation of the trained [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 8.** Figure 8: Qualitative comparisons of the proposed PHAF with UHM [19], HARP [40] and OHTA [47]. PHAF achieves [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: (a) shows the input Palmar and Dorsal images. (b) [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗

**Figure 10.** Figure 10: Impact of mesh resolution. A dense mesh results [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗

**Figure 11.** Figure 11: Control point alignment (b) results in a much [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗

read the original abstract

We present PHAF-Personalized Hand Avatars in a Flash, a personalized photo-realistic hand avatar which provides high quality multi-view renders from just two images (dorsal and palmar views).Unlike slow optimization-based techniques, PHAF generates fast personalized textures for real-time deployment on edge devices. Our approach combines semantic guided mesh alignment and densified texture extraction to transfer high-frequency details efficiently. A view-based inpainting network refines textures ensuring smooth, continuous appearance. PHAF generalizes to novel viewpoints and leverages a parametric hand model for accurate articulations, making it compatible with standard graphics engines. Experiments show it is comparable to existing methods in visual fidelity while drastically reducing texture generation time by 30 times, enabling practical AR/VR applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PHAF assembles a two-view hand avatar pipeline with claimed 30x speedup, but the abstract supplies no metrics or baselines so the claims stay unverified.

read the letter

The main point is a method that takes dorsal and palmar photos and produces a textured hand avatar fast enough for edge devices. It fits a parametric model with semantic guidance, pulls textures in a densified way, and runs them through a view-based inpainting network to smooth gaps.

That combination is the actual new piece. It targets the speed problem in existing optimization pipelines and keeps compatibility with standard graphics engines, which is useful for AR/VR work.

The approach looks reasonable on paper for filling in missing surface from two views. The inpainting step is meant to handle continuity and high-frequency details that the inputs miss.

The soft spot is the evaluation. The abstract says the results are comparable in fidelity and thirty times faster, yet it gives no numbers, no baselines, no datasets, and no ablation on how well the inpainting holds up on unseen angles. Hands have lots of foreshortened and hidden areas in just two shots, so the generalization claim rests on that network doing a lot of work without shown checks.

This is for people building real-time hand rendering systems or quick avatar tools. A reader who needs a fully validated system will find it thin right now. Someone looking for component ideas on fast texture transfer might still pick up something practical.

It deserves a serious referee if the full paper adds quantitative comparisons and view-generalization tests, because the application area is relevant and the pipeline is concrete enough to review.

Referee Report

2 major / 0 minor

Summary. The paper presents PHAF, a method to create personalized photo-realistic hand avatars from only two input images (dorsal and palmar views). It employs semantic guided mesh alignment, densified texture extraction, and a view-based inpainting network to generate textures for real-time edge-device deployment, leveraging a parametric hand model for articulations. The central claim is that the approach achieves visual fidelity comparable to existing methods while reducing texture generation time by 30x.

Significance. If validated, the 30x speedup and compatibility with standard graphics engines could enable practical AR/VR applications by addressing the slow optimization bottleneck in personalized avatar creation. The work's strength lies in its focus on limited-input generalization for real-time use, but its impact depends on whether the inpainting component reliably transfers high-frequency details across unobserved viewpoints.

major comments (2)

Abstract: the claim that 'experiments show it is comparable to existing methods in visual fidelity while drastically reducing texture generation time by 30 times' provides no metrics, baselines, datasets, or quantitative results, preventing verification of whether the data supports the central speedup and fidelity assertions.
Abstract: the generalization claim for the view-based inpainting network (transferring high-frequency details from only dorsal and palmar views to novel viewpoints while ensuring smooth continuous appearance) is load-bearing for the real-time edge deployment assertion, yet the two input views leave large surface areas unobserved or foreshortened, with no mentioned quantitative ablation on out-of-distribution views or explicit validation of continuity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and outline proposed revisions to improve clarity and verifiability.

read point-by-point responses

Referee: Abstract: the claim that 'experiments show it is comparable to existing methods in visual fidelity while drastically reducing texture generation time by 30 times' provides no metrics, baselines, datasets, or quantitative results, preventing verification of whether the data supports the central speedup and fidelity assertions.

Authors: We agree that the abstract would be strengthened by including brief quantitative highlights. The body of the manuscript (Section 4) reports specific metrics including PSNR, SSIM, and runtime comparisons against baselines on the evaluated datasets, confirming the 30x speedup and comparable fidelity. We will revise the abstract to incorporate key quantitative results supporting these claims. revision: yes
Referee: Abstract: the generalization claim for the view-based inpainting network (transferring high-frequency details from only dorsal and palmar views to novel viewpoints while ensuring smooth continuous appearance) is load-bearing for the real-time edge deployment assertion, yet the two input views leave large surface areas unobserved or foreshortened, with no mentioned quantitative ablation on out-of-distribution views or explicit validation of continuity.

Authors: The view-based inpainting network is trained on diverse hand poses and views to promote generalization from the two input images, with qualitative results on novel viewpoints provided in the experiments section to illustrate transfer of details and continuity. We acknowledge that an explicit quantitative ablation study focused on out-of-distribution views and continuity metrics is not included. We will add a targeted discussion and supporting ablation in the revised manuscript to address this. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper describes a pipeline of semantic guided mesh alignment, densified texture extraction, and a view-based inpainting network to produce textures from two input views, with claims of 30x speedup and comparable fidelity supported by experimental comparisons to prior methods. No equations, fitted parameters, or derivations are shown that reduce by construction to the inputs (e.g., no self-definitional quantities, no predictions that are statistically forced by fitting, and no load-bearing self-citations). The central claims rest on the independent behavior of the proposed components rather than renaming or smuggling prior results. This is the normal case of a method paper whose validity is externally falsifiable via the reported experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no information on free parameters, axioms, or invented entities is available.

pith-pipeline@v0.9.1-grok · 5664 in / 1109 out tokens · 33495 ms · 2026-06-28T10:48:30.210044+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 25 canonical work pages · 1 internal anchor

[1]

Mahmoud Afifi. 2019. 11K Hands: gender recognition and biometric identification using a large dataset of hand images.Multimedia Tools and Applications(2019). doi:10.1007/s11042-019-7424-8 PHAF: Personalized Hand Avatars in a Flash ICVGIP 2025, December 17–20, 2025, Mandi, India

work page doi:10.1007/s11042-019-7424-8 2019
[2]

Bokhovkin, S

A. Bokhovkin, S. Tulsiani, and A. Dai. 2023. Mesh2Tex: Generating Mesh Textures from Image Queries. InProceedings of ICCV 2023. arXiv:2304.05868 [cs.CV] doi:10.48550/arXiv.2304.05868

work page doi:10.48550/arxiv.2304.05868 2023
[3]

Yujin Chen, Zhigang Tu, Di Kang, Linchao Bao, Ying Zhang, Xuefei Zhe, Ruizhi Chen, and Junsong Yuan. 2021. Model-based 3D Hand Reconstruction via Self- Supervised Learning. arXiv:2103.11703 [cs.CV] https://arxiv.org/abs/2103.11703

arXiv 2021
[4]

Enric Corona, Tomas Hodan, Minh Vo, Francesc Moreno-Noguer, Chris Sweeney, Richard Newcombe, and Lingni Ma. 2022. LISA: Learning Implicit Shape and Appearance of Hands. (2022). arXiv:2204.01695 [cs.CV] doi:10.48550/arXiv.2204. 01695

work page doi:10.48550/arxiv.2204 2022
[5]

Deng et al

K. Deng et al. 2024. FlashTex: Fast Relightable Mesh Texturing with LightCon- trolNet. (2024). arXiv:2402.13251 [cs.CV] doi:10.48550/arXiv.2402.13251

work page doi:10.48550/arxiv.2402.13251 2024
[6]

Reproducible scaling laws for contrastive language-image learning

A. Doe et al. 2023. Handy: Towards a High Fidelity 3D Hand Shape and Appear- ance Model. (2023). doi:10.1109/CVPR52729.2023.00453

work page doi:10.1109/cvpr52729.2023.00453 2023
[7]

Q. Gan, Z. Zhou, and J. Zhu. 2024. XHand: Real-time Expressive Hand Avatar. (2024). arXiv:2407.21002 [cs.CV] doi:10.48550/arXiv.2407.21002

work page doi:10.48550/arxiv.2407.21002 2024
[8]

Gao et al

D. Gao et al. 2022. DART: Articulated Hand Model with Diverse Accessories and Rich Textures. InNeurIPS 2022 - Datasets and Benchmarks Track. doi:ps/ publications/dart2022

2022
[9]

Guo et al

Z. Guo et al. 2023. HandNeRF: Neural Radiance Fields for Animatable Interacting Hands. (2023). arXiv:2303.13825 [cs.CV] doi:10.48550/arXiv.2303.13825

work page doi:10.48550/arxiv.2303.13825 2023
[10]

David Hirshberg, Matthew Loper, Eric Rachlin, and Michael Black. 2012. Coreg- istration: Simultaneous Alignment and Modeling of Articulated 3D Shape. doi:10.1007/978-3-642-33783-3_18

work page doi:10.1007/978-3-642-33783-3_18 2012
[11]

Md Imran Hosen and Md Baharul Islam. 2022. Masked Face Inpainting Through Residual Attention UNet. In2022 Innovations in Intelligent Systems and Applica- tions Conference (ASYU). IEEE, 1–5

2022
[12]

Ivashechkin, O

M. Ivashechkin, O. Mendez, and R. Bowden. 2025. HandOcc: NeRF-based Hand Rendering with Occupancy Networks. (2025). arXiv:2505.02079 [cs.CV] doi:10. 48550/arXiv.2505.02079

arXiv 2025
[13]

Wei Jiang, Kwang Moo Yi, Golnoosh Samei, Oncel Tuzel, and Anurag Ran- jan. 2022. NeuMan: Neural Human Radiance Field from a Single Video. arXiv:2203.12575 [cs.CV] https://arxiv.org/abs/2203.12575

arXiv 2022
[14]

Williams

Zheheng Jiang, Hossein Rahmani, Sue Black, and Bryan M. Williams. 2023. A Probabilistic Attention Model with Occlusion-aware Texture Regression for 3D Hand Reconstruction from a Single RGB Image. arXiv:2304.14299 [cs.CV] https://arxiv.org/abs/2304.14299

arXiv 2023
[15]

Minje Kim and Tae-Kyun Kim. 2024. BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands. InProceedings of CVPR 2024. doi:10.48550/arXiv.2403. 08262

work page doi:10.48550/arxiv.2403 2024
[16]

Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. 2023. Segment Anything.arXiv:2304.02643(2023)

Pith/arXiv arXiv 2023
[17]

Jiahui Lei et al. 2020. Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images. (2020). arXiv:2008.07760 [cs.CV] doi:10.48550/arXiv.2008.07760

work page doi:10.48550/arxiv.2008.07760 2020
[18]

Liang et al

Y. Liang et al. 2025. UniTEX: Universal High-Fidelity Generative Texturing for 3D Shapes. (2025). arXiv:2505.23253 [cs.GR] doi:10.48550/arXiv.2505.23253

work page doi:10.48550/arxiv.2505.23253 2025
[19]

Liu et al

Y. Liu et al. 2024. UHM: Authentic Hand Avatar from a Phone Scan via Universal Hand Model. (2024). arXiv:2405.07933 [cs.CV] doi:10.48550/arXiv.2405.07933

work page doi:10.48550/arxiv.2405.07933 2024
[20]

Stephen Lombardi, Tomas Simon, Jason Saragih, Gabriel Schwartz, Andreas Lehrmann, and Yaser Sheikh. 2019. Neural volumes: learning dynamic renderable volumes from images.ACM Transactions on Graphics38 (July 2019), 1–14. doi:10. 1145/3306346.3323020

arXiv 2019
[21]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A Skinned Multi-Person Linear Model.ACM Trans. Graphics (Proc. SIGGRAPH Asia)(2015), 248:1–248:16. doi:10.1145/2816795. 2818013

work page doi:10.1145/2816795 2015
[22]

Marko Mihajlovic, Aayush Bansal, Michael Zollhoefer, Siyu Tang, and Shunsuke Saito. 2022. KeypointNeRF: Generalizing Image-based Volumetric Avatars using Relative Spatial Encoding of Keypoints. arXiv:2205.04992 [cs.CV] https://arxiv. org/abs/2205.04992

arXiv 2022
[23]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. arXiv:2003.08934 [cs.CV] doi:10.48550/arXiv.2003.08934

work page doi:10.48550/arxiv.2003.08934 2020
[24]

Aymen Mir, Thiemo Alldieck, and Gerard Pons-Moll. 2020. Learning to Transfer Texture from Clothing Images to 3D Humans. (2020). arXiv:2003.02050 [cs.CV] doi:10.48550/arXiv.2003.02050

work page doi:10.48550/arxiv.2003.02050 2020
[25]

Gyeongsik Moon. 2023. Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery in the Wild. InCVPR

2023
[26]

Tuur Stuyck Nikolaos Sarafianos et al. 2024. Garment3DGen: 3D Garment Styl- ization and Texture Generation. (2024). arXiv:2403.18816 [cs.CV] doi:10.48550/ arXiv.2403.18816

arXiv 2024
[27]

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Lab...

Pith/arXiv arXiv 2024
[28]

Barron, Sofien Bouaziz, Dan B Gold- man, Steven M

Keunhong Park, Utkarsh Sinha, Jonathan T. Barron, Sofien Bouaziz, Dan B Gold- man, Steven M. Seitz, and Ricardo Martin-Brualla. 2021. Nerfies: Deformable Neu- ral Radiance Fields. arXiv:2011.12948 [cs.CV] https://arxiv.org/abs/2011.12948

arXiv 2021
[29]

Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Xiaowei Zhou, and Hujun Bao. 2021. Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies. arXiv:2105.02872 [cs.CV] https://arxiv.org/abs/2105. 02872

arXiv 2021
[30]

Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. 2021. Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans. arXiv:2012.15838 [cs.CV] https://arxiv.org/abs/2012.15838

arXiv 2021
[32]

Gerard Pons-Moll, Javier Romero, Naureen Mahmood, and Michael Black. 2015. Dyna: A Model of Dynamic Human Shape in Motion. (2015). doi:10.15496/ publikation-10602

2015
[33]

arXiv preprint arXiv:2302.01721 , year=

E. Richardson et al. 2023. TEXTure: Text-Guided Texturing of 3D Shapes. (2023). arXiv:2302.01721 [cs.GR] doi:10.48550/arXiv.2302.01721

work page doi:10.48550/arxiv.2302.01721 2023
[34]

Javier Romero, Dimitrios Tzionas, and Michael J. Black. 2017. Embodied Hands: Modeling and Capturing Hands and Bodies Together.ACM Transactions on Graphics, (Proc. SIGGRAPH Asia)36, 6 (Nov. 2017)

2017
[35]

Smith et al

J. Smith et al . 2020. HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization. (2020)

2020
[36]

Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, and James Davis. 2005. SCAPE: Shape completion and animation of people.ACM Transac- tions on Graphics (TOG)(2005), 408–416. doi:10.1145/1073204.1073207

work page doi:10.1145/1073204.1073207 2005
[37]

David Svitov, Dmitrii Gudkov, Renat Bashirov, and Victor Lempitsky. 2023. DI- NAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars. arXiv:2303.09375 [cs.CV] https://arxiv.org/abs/2303.09375

arXiv 2023
[38]

Wang et al

H. Wang et al. 2023. HandAvatar: Free-Pose Hand Animation & Rendering from Monocular Video. (2023). arXiv:2211.12782 [cs.CV] doi:10.48550/arXiv.2211.12782

work page doi:10.48550/arxiv.2211.12782 2023
[39]

Srinivasan, Jonathan T

Chung-Yi Weng, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron, and Ira Kemelmacher-Shlizerman. 2022. HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video. arXiv:2201.04127 [cs.CV] https://arxiv. org/abs/2201.04127

arXiv 2022
[40]

Xu et al

Y. Xu et al. 2023. HARP: Personalized Hand Reconstruction from a Monocular RGB Video. (2023). arXiv:2212.09530 [cs.CV] doi:10.48550/arXiv.2212.09530

work page doi:10.48550/arxiv.2212.09530 2023
[41]

Kim Youwang, Tae-Hyun Oh, and Gerard Pons-Moll. 2024. Paint-it: Text- to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering. (2024). arXiv:2312.11360 [cs.CV] doi:10.48550/arXiv. 2312.11360

work page internal anchor Pith review doi:10.48550/arxiv 2024
[42]

X. Zeng, X. Chen, Z. Qi, et al. 2024. Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models. InProceedings of CVPR 2024. arXiv:2312.13913 [cs.CV] doi:10.48550/arXiv.2312.13913

work page doi:10.48550/arxiv.2312.13913 2024
[43]

Cheng Zhang et al . 2024. FabricDiffusion: High-Fidelity Texture Transfer for 3D Garments Generation from In-The-Wild Clothing Images. (2024). arXiv:2410.01801 [cs.CV] doi:10.48550/arXiv.2410.01801

work page doi:10.48550/arxiv.2410.01801 2024
[44]

Juze Zhang et al. 2022. Nimble: a non-rigid hand model with bones and muscles. (2022). arXiv:2202.04533 [cs.CV] doi:10.48550/arXiv.2202.04533

work page doi:10.48550/arxiv.2202.04533 2022
[45]

Jian Zhao and Hui Zhang. 2022. Thin-Plate Spline Motion Model for Image Animation. arXiv:2203.14367 [cs.CV] https://arxiv.org/abs/2203.14367

arXiv 2022
[46]

Ziwei Zhao, David Leake, Xiaomeng Ye, and David Crandall. 2024. Case-Enhanced Vision Transformer: Improving Explanations of Image Similarity with a ViT-based Similarity Metric. arXiv:2407.16981 [cs.CV] https://arxiv.org/abs/2407.16981

arXiv 2024
[47]

Zheng, C

X. Zheng, C. Wen, Z. Su, et al. 2024. OHTA: One-shot Hand Avatar via Data-driven Implicit Priors. (2024). arXiv:2402.18969 [cs.CV] doi:10.48550/arXiv.2402.18969

work page doi:10.48550/arxiv.2402.18969 2024
[48]

J. Zhu, Z. Zhao, L. Yang, and A. Yao. 2023. HiFiHR: High-Fidelity Texture for 3D Hand Reconstruction. (2023). arXiv:2308.13628 [cs.CV] doi:10.48550/arXiv.2308. 13628

work page doi:10.48550/arxiv.2308 2023

[1] [1]

Mahmoud Afifi. 2019. 11K Hands: gender recognition and biometric identification using a large dataset of hand images.Multimedia Tools and Applications(2019). doi:10.1007/s11042-019-7424-8 PHAF: Personalized Hand Avatars in a Flash ICVGIP 2025, December 17–20, 2025, Mandi, India

work page doi:10.1007/s11042-019-7424-8 2019

[2] [2]

Bokhovkin, S

A. Bokhovkin, S. Tulsiani, and A. Dai. 2023. Mesh2Tex: Generating Mesh Textures from Image Queries. InProceedings of ICCV 2023. arXiv:2304.05868 [cs.CV] doi:10.48550/arXiv.2304.05868

work page doi:10.48550/arxiv.2304.05868 2023

[3] [3]

Yujin Chen, Zhigang Tu, Di Kang, Linchao Bao, Ying Zhang, Xuefei Zhe, Ruizhi Chen, and Junsong Yuan. 2021. Model-based 3D Hand Reconstruction via Self- Supervised Learning. arXiv:2103.11703 [cs.CV] https://arxiv.org/abs/2103.11703

arXiv 2021

[4] [4]

Enric Corona, Tomas Hodan, Minh Vo, Francesc Moreno-Noguer, Chris Sweeney, Richard Newcombe, and Lingni Ma. 2022. LISA: Learning Implicit Shape and Appearance of Hands. (2022). arXiv:2204.01695 [cs.CV] doi:10.48550/arXiv.2204. 01695

work page doi:10.48550/arxiv.2204 2022

[5] [5]

Deng et al

K. Deng et al. 2024. FlashTex: Fast Relightable Mesh Texturing with LightCon- trolNet. (2024). arXiv:2402.13251 [cs.CV] doi:10.48550/arXiv.2402.13251

work page doi:10.48550/arxiv.2402.13251 2024

[6] [6]

Reproducible scaling laws for contrastive language-image learning

A. Doe et al. 2023. Handy: Towards a High Fidelity 3D Hand Shape and Appear- ance Model. (2023). doi:10.1109/CVPR52729.2023.00453

work page doi:10.1109/cvpr52729.2023.00453 2023

[7] [7]

Q. Gan, Z. Zhou, and J. Zhu. 2024. XHand: Real-time Expressive Hand Avatar. (2024). arXiv:2407.21002 [cs.CV] doi:10.48550/arXiv.2407.21002

work page doi:10.48550/arxiv.2407.21002 2024

[8] [8]

Gao et al

D. Gao et al. 2022. DART: Articulated Hand Model with Diverse Accessories and Rich Textures. InNeurIPS 2022 - Datasets and Benchmarks Track. doi:ps/ publications/dart2022

2022

[9] [9]

Guo et al

Z. Guo et al. 2023. HandNeRF: Neural Radiance Fields for Animatable Interacting Hands. (2023). arXiv:2303.13825 [cs.CV] doi:10.48550/arXiv.2303.13825

work page doi:10.48550/arxiv.2303.13825 2023

[10] [10]

David Hirshberg, Matthew Loper, Eric Rachlin, and Michael Black. 2012. Coreg- istration: Simultaneous Alignment and Modeling of Articulated 3D Shape. doi:10.1007/978-3-642-33783-3_18

work page doi:10.1007/978-3-642-33783-3_18 2012

[11] [11]

Md Imran Hosen and Md Baharul Islam. 2022. Masked Face Inpainting Through Residual Attention UNet. In2022 Innovations in Intelligent Systems and Applica- tions Conference (ASYU). IEEE, 1–5

2022

[12] [12]

Ivashechkin, O

M. Ivashechkin, O. Mendez, and R. Bowden. 2025. HandOcc: NeRF-based Hand Rendering with Occupancy Networks. (2025). arXiv:2505.02079 [cs.CV] doi:10. 48550/arXiv.2505.02079

arXiv 2025

[13] [13]

Wei Jiang, Kwang Moo Yi, Golnoosh Samei, Oncel Tuzel, and Anurag Ran- jan. 2022. NeuMan: Neural Human Radiance Field from a Single Video. arXiv:2203.12575 [cs.CV] https://arxiv.org/abs/2203.12575

arXiv 2022

[14] [14]

Williams

Zheheng Jiang, Hossein Rahmani, Sue Black, and Bryan M. Williams. 2023. A Probabilistic Attention Model with Occlusion-aware Texture Regression for 3D Hand Reconstruction from a Single RGB Image. arXiv:2304.14299 [cs.CV] https://arxiv.org/abs/2304.14299

arXiv 2023

[15] [15]

Minje Kim and Tae-Kyun Kim. 2024. BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands. InProceedings of CVPR 2024. doi:10.48550/arXiv.2403. 08262

work page doi:10.48550/arxiv.2403 2024

[16] [16]

Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. 2023. Segment Anything.arXiv:2304.02643(2023)

Pith/arXiv arXiv 2023

[17] [17]

Jiahui Lei et al. 2020. Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images. (2020). arXiv:2008.07760 [cs.CV] doi:10.48550/arXiv.2008.07760

work page doi:10.48550/arxiv.2008.07760 2020

[18] [18]

Liang et al

Y. Liang et al. 2025. UniTEX: Universal High-Fidelity Generative Texturing for 3D Shapes. (2025). arXiv:2505.23253 [cs.GR] doi:10.48550/arXiv.2505.23253

work page doi:10.48550/arxiv.2505.23253 2025

[19] [19]

Liu et al

Y. Liu et al. 2024. UHM: Authentic Hand Avatar from a Phone Scan via Universal Hand Model. (2024). arXiv:2405.07933 [cs.CV] doi:10.48550/arXiv.2405.07933

work page doi:10.48550/arxiv.2405.07933 2024

[20] [20]

Stephen Lombardi, Tomas Simon, Jason Saragih, Gabriel Schwartz, Andreas Lehrmann, and Yaser Sheikh. 2019. Neural volumes: learning dynamic renderable volumes from images.ACM Transactions on Graphics38 (July 2019), 1–14. doi:10. 1145/3306346.3323020

arXiv 2019

[21] [21]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A Skinned Multi-Person Linear Model.ACM Trans. Graphics (Proc. SIGGRAPH Asia)(2015), 248:1–248:16. doi:10.1145/2816795. 2818013

work page doi:10.1145/2816795 2015

[22] [22]

Marko Mihajlovic, Aayush Bansal, Michael Zollhoefer, Siyu Tang, and Shunsuke Saito. 2022. KeypointNeRF: Generalizing Image-based Volumetric Avatars using Relative Spatial Encoding of Keypoints. arXiv:2205.04992 [cs.CV] https://arxiv. org/abs/2205.04992

arXiv 2022

[23] [23]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. arXiv:2003.08934 [cs.CV] doi:10.48550/arXiv.2003.08934

work page doi:10.48550/arxiv.2003.08934 2020

[24] [24]

Aymen Mir, Thiemo Alldieck, and Gerard Pons-Moll. 2020. Learning to Transfer Texture from Clothing Images to 3D Humans. (2020). arXiv:2003.02050 [cs.CV] doi:10.48550/arXiv.2003.02050

work page doi:10.48550/arxiv.2003.02050 2020

[25] [25]

Gyeongsik Moon. 2023. Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery in the Wild. InCVPR

2023

[26] [26]

Tuur Stuyck Nikolaos Sarafianos et al. 2024. Garment3DGen: 3D Garment Styl- ization and Texture Generation. (2024). arXiv:2403.18816 [cs.CV] doi:10.48550/ arXiv.2403.18816

arXiv 2024

[27] [27]

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Lab...

Pith/arXiv arXiv 2024

[28] [28]

Barron, Sofien Bouaziz, Dan B Gold- man, Steven M

Keunhong Park, Utkarsh Sinha, Jonathan T. Barron, Sofien Bouaziz, Dan B Gold- man, Steven M. Seitz, and Ricardo Martin-Brualla. 2021. Nerfies: Deformable Neu- ral Radiance Fields. arXiv:2011.12948 [cs.CV] https://arxiv.org/abs/2011.12948

arXiv 2021

[29] [29]

Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Xiaowei Zhou, and Hujun Bao. 2021. Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies. arXiv:2105.02872 [cs.CV] https://arxiv.org/abs/2105. 02872

arXiv 2021

[30] [30]

Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. 2021. Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans. arXiv:2012.15838 [cs.CV] https://arxiv.org/abs/2012.15838

arXiv 2021

[31] [32]

Gerard Pons-Moll, Javier Romero, Naureen Mahmood, and Michael Black. 2015. Dyna: A Model of Dynamic Human Shape in Motion. (2015). doi:10.15496/ publikation-10602

2015

[32] [33]

arXiv preprint arXiv:2302.01721 , year=

E. Richardson et al. 2023. TEXTure: Text-Guided Texturing of 3D Shapes. (2023). arXiv:2302.01721 [cs.GR] doi:10.48550/arXiv.2302.01721

work page doi:10.48550/arxiv.2302.01721 2023

[33] [34]

Javier Romero, Dimitrios Tzionas, and Michael J. Black. 2017. Embodied Hands: Modeling and Capturing Hands and Bodies Together.ACM Transactions on Graphics, (Proc. SIGGRAPH Asia)36, 6 (Nov. 2017)

2017

[34] [35]

Smith et al

J. Smith et al . 2020. HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization. (2020)

2020

[35] [36]

Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, and James Davis. 2005. SCAPE: Shape completion and animation of people.ACM Transac- tions on Graphics (TOG)(2005), 408–416. doi:10.1145/1073204.1073207

work page doi:10.1145/1073204.1073207 2005

[36] [37]

David Svitov, Dmitrii Gudkov, Renat Bashirov, and Victor Lempitsky. 2023. DI- NAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars. arXiv:2303.09375 [cs.CV] https://arxiv.org/abs/2303.09375

arXiv 2023

[37] [38]

Wang et al

H. Wang et al. 2023. HandAvatar: Free-Pose Hand Animation & Rendering from Monocular Video. (2023). arXiv:2211.12782 [cs.CV] doi:10.48550/arXiv.2211.12782

work page doi:10.48550/arxiv.2211.12782 2023

[38] [39]

Srinivasan, Jonathan T

Chung-Yi Weng, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron, and Ira Kemelmacher-Shlizerman. 2022. HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video. arXiv:2201.04127 [cs.CV] https://arxiv. org/abs/2201.04127

arXiv 2022

[39] [40]

Xu et al

Y. Xu et al. 2023. HARP: Personalized Hand Reconstruction from a Monocular RGB Video. (2023). arXiv:2212.09530 [cs.CV] doi:10.48550/arXiv.2212.09530

work page doi:10.48550/arxiv.2212.09530 2023

[40] [41]

Kim Youwang, Tae-Hyun Oh, and Gerard Pons-Moll. 2024. Paint-it: Text- to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering. (2024). arXiv:2312.11360 [cs.CV] doi:10.48550/arXiv. 2312.11360

work page internal anchor Pith review doi:10.48550/arxiv 2024

[41] [42]

X. Zeng, X. Chen, Z. Qi, et al. 2024. Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models. InProceedings of CVPR 2024. arXiv:2312.13913 [cs.CV] doi:10.48550/arXiv.2312.13913

work page doi:10.48550/arxiv.2312.13913 2024

[42] [43]

Cheng Zhang et al . 2024. FabricDiffusion: High-Fidelity Texture Transfer for 3D Garments Generation from In-The-Wild Clothing Images. (2024). arXiv:2410.01801 [cs.CV] doi:10.48550/arXiv.2410.01801

work page doi:10.48550/arxiv.2410.01801 2024

[43] [44]

Juze Zhang et al. 2022. Nimble: a non-rigid hand model with bones and muscles. (2022). arXiv:2202.04533 [cs.CV] doi:10.48550/arXiv.2202.04533

work page doi:10.48550/arxiv.2202.04533 2022

[44] [45]

Jian Zhao and Hui Zhang. 2022. Thin-Plate Spline Motion Model for Image Animation. arXiv:2203.14367 [cs.CV] https://arxiv.org/abs/2203.14367

arXiv 2022

[45] [46]

Ziwei Zhao, David Leake, Xiaomeng Ye, and David Crandall. 2024. Case-Enhanced Vision Transformer: Improving Explanations of Image Similarity with a ViT-based Similarity Metric. arXiv:2407.16981 [cs.CV] https://arxiv.org/abs/2407.16981

arXiv 2024

[46] [47]

Zheng, C

X. Zheng, C. Wen, Z. Su, et al. 2024. OHTA: One-shot Hand Avatar via Data-driven Implicit Priors. (2024). arXiv:2402.18969 [cs.CV] doi:10.48550/arXiv.2402.18969

work page doi:10.48550/arxiv.2402.18969 2024

[47] [48]

J. Zhu, Z. Zhao, L. Yang, and A. Yao. 2023. HiFiHR: High-Fidelity Texture for 3D Hand Reconstruction. (2023). arXiv:2308.13628 [cs.CV] doi:10.48550/arXiv.2308. 13628

work page doi:10.48550/arxiv.2308 2023