pith. sign in

arxiv: 2405.20330 · v4 · submitted 2024-05-30 · 💻 cs.CV · cs.AI· cs.GR

OmniHands: Towards Robust 4D Hand Mesh Recovery via A Versatile Transformer

Pith reviewed 2026-05-24 01:20 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.GR
keywords hand mesh recovery4D reconstructiontransformertwo-hand interactionrelation-aware tokenizationmonocular inputmulti-viewhand pose
0
0 comments X

The pith

OmniHands recovers 4D interactive hand meshes from monocular or multi-view inputs by embedding positional relations into tokens.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces OmniHands as a universal transformer architecture for recovering interactive hand meshes along with their relative movements from single or multiple camera views. It targets the absence of a single model that works across single-hand and two-hand image inputs while also ignoring the spatial layout between two hands. Relation-aware Two-Hand Tokenization embeds the positional relationship directly into the tokens so the same network can process both input types and make use of that layout. A 4D Interaction Reasoning module then fuses the tokens over time using attention before decoding them into 3D meshes and relative motion. The resulting approach shows stronger results on standard benchmarks and real-world video for reconstructing hand interactions.

Core claim

OmniHands supplies a single architecture that adapts to varied hand-image tasks through new tokenization and fusion steps. Relation-aware Two-Hand Tokenization places positional relation data inside the hand tokens, letting the network treat single-hand and two-hand cases uniformly while still using their relative positions. The 4D Interaction Reasoning module fuses these tokens across time with attention and produces 3D hand meshes together with their temporal movements, which supports accurate recovery of detailed hand interactions.

What carries the argument

Relation-aware Two-Hand Tokenization (RAT), which inserts positional relation data into hand tokens so one network can handle single-hand and two-hand cases while using their spatial layout.

Load-bearing premise

Embedding positional relation information via RAT lets a single network process both single-hand and two-hand inputs while making explicit use of their spatial relationship.

What would settle it

On a two-hand interaction benchmark, a model variant without the RAT module would need to match the full model's accuracy on relative hand position and interaction reconstruction to undermine the tokenization step.

Figures

Figures reproduced from arXiv: 2405.20330 by Dixuan Lin, Hongwen Zhang, Mengcheng Li, Qianying Wang, Qi Yan, Wei Jing, Yebin Liu, Yuxiang Zhang.

Figure 1
Figure 1. Figure 1: The proposed method, OmniHands, can robustly recover interactive hand motions and their relative movement from monocular inputs. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our OmniHands framework. OmniHands is a transformer-based network, which takes various forms of inputs and estimates two-hand [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative Comparison. We compare OmniHands with state-of-the-art methods on in-the-wild datasets ARCTIC[Fan et al [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative Results of in-the-wild hard cases. We show our model’s results on complex cross-hand interactions in realistic scenarios to demonstrate its [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The qualitative results of OmniHands with multi-view inputs. We present the results on Interhand2.6m [Moon et al [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ablation study on occlusion case. ’Full Model’ is our model with [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Ablation study of RAT. Two views of predicted hand meshes are [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
read the original abstract

In this paper, we introduce OmniHands, a universal approach to recovering interactive hand meshes and their relative movement from monocular or multi-view inputs. Our approach addresses two major limitations of previous methods: lacking a unified solution for handling various hand image inputs and neglecting the positional relationship of two hands within images. To overcome these challenges, we develop a universal architecture with novel tokenization and contextual feature fusion strategies, capable of adapting to a variety of tasks. Specifically, we propose a Relation-aware Two-Hand Tokenization (RAT) method to embed positional relation information into the hand tokens. In this way, our network can handle both single-hand and two-hand inputs and explicitly leverage relative hand positions, facilitating the reconstruction of intricate hand interactions in real-world scenarios. As such tokenization indicates the relative relationship of two hands, it also supports more effective feature fusion. To this end, we further develop a 4D Interaction Reasoning (FIR) module to fuse hand tokens in 4D with attention and decode them into 3D hand meshes and relative temporal movements. The efficacy of our approach is validated on several benchmark datasets. The results on in-the-wild videos and real-world scenarios demonstrate the superior performances of our approach for interactive hand reconstruction. More video results can be found on the project page: https://OmniHand.github.io.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The paper introduces OmniHands, a universal transformer-based architecture for 4D hand mesh recovery from monocular or multi-view inputs. It proposes Relation-aware Two-Hand Tokenization (RAT) to embed positional relation information into hand tokens, enabling a single network to process both single-hand and two-hand inputs while explicitly using their spatial relationship. A 4D Interaction Reasoning (FIR) module is developed to fuse hand tokens in 4D via attention and decode them into 3D meshes and relative temporal movements. The approach is claimed to be validated on benchmark datasets with superior performance on in-the-wild videos and real-world scenarios.

Significance. If the empirical results and ablations substantiate the claims, the work would provide a unified framework that explicitly models hand interactions, addressing limitations of prior methods that lack unified handling of input types or neglect positional relationships, with potential impact on interactive hand reconstruction tasks.

minor comments (1)
  1. [Abstract] Abstract: the abstract asserts validation on benchmarks and superior performance on in-the-wild videos yet supplies no quantitative numbers, error bars, ablation results, or derivation details, which prevents verification of whether the data support the central claim about RAT and FIR.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their summary of OmniHands and for noting its potential significance as a unified framework for interactive hand reconstruction, conditional on the empirical results. We observe that the recommendation is 'uncertain' and that no specific major comments were listed under the MAJOR COMMENTS section. We therefore provide no point-by-point responses but remain available to supply additional details on the experiments, ablations, or any other aspect of the manuscript to address the uncertainty regarding substantiation of the claims.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents a new transformer-based architecture (OmniHands) with a proposed Relation-aware Two-Hand Tokenization (RAT) module and 4D Interaction Reasoning (FIR) module as design choices for handling single- and two-hand inputs. No equations, fitted parameters, or predictions are shown that reduce by construction to the inputs; the central claims rest on the architectural description and empirical validation on benchmarks rather than self-referential definitions or load-bearing self-citations. The derivation chain is self-contained as a proposed method without the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review prevents identification of specific free parameters or axioms; the approach appears to rest on standard transformer assumptions plus the two newly proposed modules.

pith-pipeline@v0.9.0 · 5797 in / 1034 out tokens · 30067 ms · 2026-05-24T01:20:06.769021+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 1 internal anchor

  1. [1]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  2. [2]

    Pushing the envelope for rgb-based dense 3d hand pose estimation via neural rendering

    Seungryul Baek, Kwang In Kim, and Tae-Kyun Kim. Pushing the envelope for rgb-based dense 3d hand pose estimation via neural rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019

  3. [3]

    3d hand shape and pose from images in the wild

    Adnane Boukhayma, Rodrigo de Bem, and Philip HS Torr. 3d hand shape and pose from images in the wild. In CVPR, 2019

  4. [4]

    Narang, Karl Van Wyk , Umar Iqbal, Stan Birchfield, Jan Kautz, and Dieter Fox

    Yu-Wei Chao, Wei Yang, Yu Xiang, Pavlo Molchanov, Ankur Handa, Jonathan Tremblay, Yashraj S. Narang, Karl Van Wyk , Umar Iqbal, Stan Birchfield, Jan Kautz, and Dieter Fox. DexYCB : A benchmark for capturing hand grasping of objects. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

  5. [5]

    Camera-space hand mesh recovery via semantic aggregationand adaptive 2d-1d registration

    Xingyu Chen, Yufeng Liu, Chongyang Ma, Jianlong Chang, Huayan Wang, Tian Chen, Xiaoyan Guo, Pengfei Wan, and Wen Zheng. Camera-space hand mesh recovery via semantic aggregationand adaptive 2d-1d registration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

  6. [6]

    Mobrecon: Mobile-friendly hand mesh reconstruction from monocular image

    Xingyu Chen, Yufeng Liu, Dong Yajiao, Xiong Zhang, Chongyang Ma, Yanmin Xiong, Yuan Zhang, and Xiaoyan Guo. Mobrecon: Mobile-friendly hand mesh reconstruction from monocular image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

  7. [7]

    Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose

    Hongsuk Choi, Gyeongsik Moon, and Kyoung Mu Lee. Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose. In European Conference on Computer Vision (ECCV), 2020

  8. [8]

    Beyond static features for temporally consistent 3d human pose and shape from a video

    Hongsuk Choi, Gyeongsik Moon, Ju Yong Chang, and Kyoung Mu Lee. Beyond static features for temporally consistent 3d human pose and shape from a video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1964--1973, 2021

  9. [9]

    Vasileios Choutas, Georgios Pavlakos, Timo Bolkart, Dimitrios Tzionas, and Michael J. Black. Monocular expressive body regression through body-driven attention. In European Conference on Computer Vision (ECCV), 2020

  10. [10]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020

  11. [11]

    Arctic: A dataset for dexterous bimanual hand-object manipulation

    Zicong Fan, Omid Taheri, Dimitrios Tzionas, Muhammed Kocabas, Manuel Kaufmann, Michael J Black, and Otmar Hilliges. Arctic: A dataset for dexterous bimanual hand-object manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12943--12954, 2023

  12. [12]

    Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time

    Hao-Shu Fang, Jiefeng Li, Hongyang Tang, Chao Xu, Haoyi Zhu, Yuliang Xiu, Yong-Lu Li, and Cewu Lu. Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022

  13. [13]

    Yao Feng, Vasileios Choutas, Timo Bolkart, Dimitrios Tzionas, and Michael J. Black. Collaborative regression of expressive bodies using moderation. In International Conference on 3D Vision (3DV), 2021

  14. [14]

    Deformer: Dynamic fusion transformer for robust hand pose estimation

    Qichen Fu, Xingyu Liu, Ran Xu, Juan Carlos Niebles, and Kris M Kitani. Deformer: Dynamic fusion transformer for robust hand pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 23600--23611, 2023

  15. [15]

    3D hand shape and pose estimation from a single RGB image

    Liuhao Ge, Zhou Ren, Yuncheng Li, Zehao Xue, Yingying Wang, Jianfei Cai, and Junsong Yuan. 3D hand shape and pose estimation from a single RGB image. In CVPR, 2019

  16. [16]

    Humans in 4d: Reconstructing and tracking humans with transformers

    Shubham Goel, Georgios Pavlakos, Jathushan Rajasegaran, Angjoo Kanazawa, and Jitendra Malik. Humans in 4d: Reconstructing and tracking humans with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14783--14794, 2023

  17. [17]

    Honnotate: A method for 3d annotation of hand and object poses

    Shreyas Hampali, Mahdi Rad, Markus Oberweger, and Vincent Lepetit. Honnotate: A method for 3d annotation of hand and object poses. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3196--3206, 2020

  18. [18]

    Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction

    Yana Hasson, Bugra Tekin, Federica Bogo, Ivan Laptev, Marc Pollefeys, and Cordelia Schmid. Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In CVPR, 2020

  19. [19]

    Whole-body human pose estimation in the wild

    Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, and Ping Luo. Whole-body human pose estimation in the wild. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part IX 16, pages 196--214. Springer, 2020

  20. [20]

    Total capture: A 3D deformation model for tracking faces, hands, and bodies

    Hanbyul Joo, Tomas Simon, and Yaser Sheikh. Total capture: A 3D deformation model for tracking faces, hands, and bodies. In CVPR, 2018

  21. [21]

    End-to-end recovery of human shape and pose

    Angjoo Kanazawa, Michael J Black, David W Jacobs, and Jitendra Malik. End-to-end recovery of human shape and pose. In CVPR, pages 7122--7131, 2018

  22. [22]

    End-to-end detection and pose estimation of two interacting hands

    Dong Uk Kim, Kwang In Kim, and Seungryul Baek. End-to-end detection and pose estimation of two interacting hands. In ICCV, 2021

  23. [23]

    Vibe: Video inference for human body pose and shape estimation

    Muhammed Kocabas, Nikos Athanasiou, and Michael J Black. Vibe: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5253--5263, 2020

  24. [24]

    Learning to reconstruct 3d human pose and shape via model-fitting in the loop

    Nikos Kolotouros, Georgios Pavlakos, Michael J Black, and Kostas Daniilidis. Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In ICCV, pages 2252--2261, 2019

  25. [25]

    Convolutional mesh regression for single-image human shape reconstruction

    Nikos Kolotouros , Georgios Pavlakos , and Kostas Daniilidis . Convolutional mesh regression for single-image human shape reconstruction. In CVPR, 2019

  26. [26]

    Weakly-supervised mesh-convolutional hand reconstruction in the wild

    Dominik Kulon, Riza Alp Güler, Iasonas Kokkinos, Michael Bronstein, and Stefanos Zafeiriou. Weakly-supervised mesh-convolutional hand reconstruction in the wild. In CVPR, 2020

  27. [27]

    Renderih: A large-scale synthetic dataset for 3d interacting hand pose estimation

    Lijun Li, Linrui Tian, Xindi Zhang, Qi Wang, Bang Zhang, Liefeng Bo, Mengyuan Liu, and Chen Chen. Renderih: A large-scale synthetic dataset for 3d interacting hand pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 20395--20405, 2023

  28. [28]

    Interacting attention graph for single image two-hand reconstruction

    Mengcheng Li, Liang An, Hongwen Zhang, Lianpeng Wu, Feng Chen, Tao Yu, and Yebin Liu. Interacting attention graph for single image two-hand reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2761--2770, 2022

  29. [29]

    HHMR : Holistic hand mesh recovery by enhancing the multimodal controllability of graph diffusion models

    Mengcheng Li, Hongwen Zhang, Yuxiang Zhang, Ruizhi Shao, Tao Yu, and Yebin Liu. HHMR : Holistic hand mesh recovery by enhancing the multimodal controllability of graph diffusion models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2024

  30. [30]

    End-to-end human pose and mesh reconstruction with transformers

    Kevin Lin , Lijuan Wang , and Zicheng Liu . End-to-end human pose and mesh reconstruction with transformers. In CVPR, 2021

  31. [31]

    Mesh graphormer

    Kevin Lin, Lijuan Wang, and Zicheng Liu. Mesh graphormer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12939--12948, 2021

  32. [32]

    Semi-supervised 3d hand-object poses estimation with interactions in time

    Shaowei Liu, Hanwen Jiang, Jiarui Xu, Sifei Liu, and Xiaolong Wang. Semi-supervised 3d hand-object poses estimation with interactions in time. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2021

  33. [33]

    3D interacting hand pose estimation by hand de-occlusion and removal

    Hao Meng, Sheng Jin, Wentao Liu, Chen Qian, Mengxiang Lin, Wanli Ouyang, and Ping Luo. 3D interacting hand pose estimation by hand de-occlusion and removal. In ECCV, pages 380--397. Springer, 2022

  34. [34]

    Bringing inputs to shared domains for 3d interacting hands recovery in the wild

    Gyeongsik Moon. Bringing inputs to shared domains for 3d interacting hands recovery in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17028--17037, 2023

  35. [35]

    Interhand2

    Gyeongsik Moon, Shoou-I Yu, He Wen, Takaaki Shiratori, and Kyoung Mu Lee. Interhand2. 6m: A dataset and baseline for 3d interacting hand pose estimation from a single rgb image. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XX 16, pages 548--564. Springer, 2020

  36. [36]

    Accurate 3d hand pose estimation for whole-body 3d human mesh estimation

    Gyeongsik Moon, Hongsuk Choi, and Kyoung Mu Lee. Accurate 3d hand pose estimation for whole-body 3d human mesh estimation. In Computer Vision and Pattern Recognition Workshop (CVPRW), 2022

  37. [37]

    A dataset of relighted 3d interacting hands

    Gyeongsik Moon, Shunsuke Saito, Weipeng Xu, Rohan Joshi, Julia Buffalini, Harley Bellan, Nicholas Rosen, Jesse Richardson, Mallorie Mize, Philippe De Bree, et al. A dataset of relighted 3d interacting hands. Advances in Neural Information Processing Systems, 36, 2024

  38. [38]

    Handoccnet: Occlusion-robust 3d hand mesh estimation network

    JoonKyu Park, Yeonguk Oh, Gyeongsik Moon, Hongsuk Choi, and Kyoung Mu Lee. Handoccnet: Occlusion-robust 3d hand mesh estimation network. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022

  39. [39]

    Pytorch: An imperative style, high-performance deep learning library

    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019

  40. [40]

    Reconstructing hands in 3d with transformers

    Georgios Pavlakos, Dandan Shan, Ilija Radosavovic, Angjoo Kanazawa, David Fouhey, and Jitendra Malik. Reconstructing hands in 3d with transformers. arXiv preprint arXiv:2312.05251, 2023

  41. [41]

    Decoupled iterative refinement framework for interacting hands reconstruction from a single rgb image

    Pengfei Ren, Chao Wen, Xiaozheng Zheng, Zhou Xue, Haifeng Sun, Qi Qi, Jingyu Wang, and Jianxin Liao. Decoupled iterative refinement framework for interacting hands reconstruction from a single rgb image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8014--8025, 2023

  42. [42]

    Javier Romero, Dimitrios Tzionas, and Michael J. Black. Embodied hands: Modeling and capturing hands and bodies together. In SIGGRAPH Asia, 2017

  43. [43]

    Frankmocap: Fast monocular 3D hand and body motion capture by regression and integration

    Yu Rong, Takaaki Shiratori, and Hanbyul Joo. Frankmocap: Fast monocular 3D hand and body motion capture by regression and integration. In ICCVW, 2021

  44. [44]

    Hand keypoint detection in single images using multiview bootstrapping

    Tomas Simon, Hanbyul Joo, Iain Matthews, and Yaser Sheikh. Hand keypoint detection in single images using multiview bootstrapping. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1145--1153, 2017

  45. [45]

    Towards accurate alignment in real-time 3D hand-mesh reconstruction

    Xiao Tang , Tianyu Wang , and Chi-Wing Fu . Towards accurate alignment in real-time 3D hand-mesh reconstruction. In ICCV, 2021

  46. [46]

    Recovering 3D human mesh from monocular images: A survey

    Yating Tian, Hongwen Zhang, Yebin Liu, and Limin Wang. Recovering 3D human mesh from monocular images: A survey. IEEE transactions on pattern analysis and machine intelligence, 2023

  47. [47]

    Consistent 3d hand reconstruction in video via self-supervised learning

    Zhigang Tu, Zhisheng Huang, Yujin Chen, Di Kang, Linchao Bao, Bisheng Yang, and Junsong Yuan. Consistent 3d hand reconstruction in video via self-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

  48. [48]

    Capturing hands in action using discriminative salient points and physics simulation

    Dimitrios Tzionas, Luca Ballan, Abhilash Srikantha, Pablo Aponte, Marc Pollefeys, and Juergen Gall. Capturing hands in action using discriminative salient points and physics simulation. International Journal of Computer Vision, 118: 0 172--193, 2016

  49. [49]

    Gomez, ukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, page 6000–6010, Red Hook, NY, USA, 2017. Curran Associates Inc

  50. [50]

    Memahand: Exploiting mesh-mano interaction for single image two-hand reconstruction

    Congyi Wang, Feida Zhu, and Shilei Wen. Memahand: Exploiting mesh-mano interaction for single image two-hand reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 564--573, 2023

  51. [51]

    Monocular total capture: Posing face, body, and hands in the wild

    Donglai Xiang, Hanbyul Joo, and Yaser Sheikh. Monocular total capture: Posing face, body, and hands in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10965--10974, 2019

  52. [52]

    Vitpose: Simple vision transformer baselines for human pose estimation

    Yufei Xu, Jing Zhang, Qiming Zhang, and Dacheng Tao. Vitpose: Simple vision transformer baselines for human pose estimation. Advances in Neural Information Processing Systems, 35: 0 38571--38584, 2022

  53. [53]

    Seqhand: Rgb-sequence-based 3d hand pose and shape estimation

    John Yang, Hyung Jin Chang, Seungeui Lee, and Nojun Kwak. Seqhand: Rgb-sequence-based 3d hand pose and shape estimation. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XII 16, pages 122--139. Springer, 2020

  54. [54]

    Acr: Attention collaboration-based regressor for arbitrary two-hand reconstruction

    Zhengdi Yu, Shaoli Huang, Chen Fang, Toby P Breckon, and Jue Wang. Acr: Attention collaboration-based regressor for arbitrary two-hand reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12955--12964, 2023

  55. [55]

    Interacting two-hand 3d pose and shape reconstruction from single color image

    Baowen Zhang, Yangang Wang, Xiaoming Deng, Yinda Zhang, Ping Tan, Cuixia Ma, and Hongan Wang. Interacting two-hand 3d pose and shape reconstruction from single color image. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11354--11363, 2021

  56. [56]

    PyMAF : 3D human pose and shape regression with pyramidal mesh alignment feedback loop

    Hongwen Zhang , Yating Tian , Xinchi Zhou , Wanli Ouyang , Yebin Liu , Limin Wang , and Zhenan Sun . PyMAF : 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In ICCV, 2021

  57. [57]

    Learning 3D human shape and pose from dense body parts

    Hongwen Zhang, Jie Cao, Guo Lu, Wanli Ouyang, and Zhenan Sun. Learning 3D human shape and pose from dense body parts. TPAMI, 44 0 (5): 0 2610--2627, 2022

  58. [58]

    PyMAF-X : Towards well-aligned full-body model regression from monocular images

    Hongwen Zhang, Yating Tian, Yuxiang Zhang, Mengcheng Li, Liang An, Zhenan Sun, and Yebin Liu. PyMAF-X : Towards well-aligned full-body model regression from monocular images. TPAMI, 2023

  59. [59]

    End-to-end hand mesh recovery from a monocular rgb image

    Xiong Zhang, Qiang Li, Hong Mo, Wenbo Zhang, and Wen Zheng. End-to-end hand mesh recovery from a monocular rgb image. In ICCV, 2019

  60. [60]

    Light-weight multi-person total capture using sparse multi-view cameras

    Yuxiang Zhang, Zhe Li, Liang An, Mengcheng Li, Tao Yu, and Yebin Liu. Light-weight multi-person total capture using sparse multi-view cameras. In ICCV, 2021

  61. [61]

    Exploiting spatial-temporal context for interacting hand reconstruction on monocular rgb video

    Weichao Zhao, Hezhen Hu, Wengang Zhou, Li Li, and Houqiang Li. Exploiting spatial-temporal context for interacting hand reconstruction on monocular rgb video. ACM Transactions on Multimedia Computing, Communications and Applications, 20 0 (6): 0 1--18, 2024

  62. [62]

    Learning to estimate 3d hand pose from single rgb images

    Christian Zimmermann and Thomas Brox. Learning to estimate 3d hand pose from single rgb images. In Proceedings of the IEEE international conference on computer vision, pages 4903--4911, 2017

  63. [63]

    Argus, and Thomas Brox

    Christian Zimmermann, Duygu Ceylan, Jimei Yang, Bryan Russell, Max J. Argus, and Thomas Brox. Freihand: A dataset for markerless capture of hand pose and shape from single rgb images. In ICCV, 2019

  64. [64]

    Reconstructing interacting hands with interaction prior from monocular images

    Binghui Zuo, Zimeng Zhao, Wenqian Sun, Wei Xie, Zhou Xue, and Yangang Wang. Reconstructing interacting hands with interaction prior from monocular images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9054--9064, 2023