OmniHands: Towards Robust 4D Hand Mesh Recovery via A Versatile Transformer
Pith reviewed 2026-05-24 01:20 UTC · model grok-4.3
The pith
OmniHands recovers 4D interactive hand meshes from monocular or multi-view inputs by embedding positional relations into tokens.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OmniHands supplies a single architecture that adapts to varied hand-image tasks through new tokenization and fusion steps. Relation-aware Two-Hand Tokenization places positional relation data inside the hand tokens, letting the network treat single-hand and two-hand cases uniformly while still using their relative positions. The 4D Interaction Reasoning module fuses these tokens across time with attention and produces 3D hand meshes together with their temporal movements, which supports accurate recovery of detailed hand interactions.
What carries the argument
Relation-aware Two-Hand Tokenization (RAT), which inserts positional relation data into hand tokens so one network can handle single-hand and two-hand cases while using their spatial layout.
Load-bearing premise
Embedding positional relation information via RAT lets a single network process both single-hand and two-hand inputs while making explicit use of their spatial relationship.
What would settle it
On a two-hand interaction benchmark, a model variant without the RAT module would need to match the full model's accuracy on relative hand position and interaction reconstruction to undermine the tokenization step.
Figures
read the original abstract
In this paper, we introduce OmniHands, a universal approach to recovering interactive hand meshes and their relative movement from monocular or multi-view inputs. Our approach addresses two major limitations of previous methods: lacking a unified solution for handling various hand image inputs and neglecting the positional relationship of two hands within images. To overcome these challenges, we develop a universal architecture with novel tokenization and contextual feature fusion strategies, capable of adapting to a variety of tasks. Specifically, we propose a Relation-aware Two-Hand Tokenization (RAT) method to embed positional relation information into the hand tokens. In this way, our network can handle both single-hand and two-hand inputs and explicitly leverage relative hand positions, facilitating the reconstruction of intricate hand interactions in real-world scenarios. As such tokenization indicates the relative relationship of two hands, it also supports more effective feature fusion. To this end, we further develop a 4D Interaction Reasoning (FIR) module to fuse hand tokens in 4D with attention and decode them into 3D hand meshes and relative temporal movements. The efficacy of our approach is validated on several benchmark datasets. The results on in-the-wild videos and real-world scenarios demonstrate the superior performances of our approach for interactive hand reconstruction. More video results can be found on the project page: https://OmniHand.github.io.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces OmniHands, a universal transformer-based architecture for 4D hand mesh recovery from monocular or multi-view inputs. It proposes Relation-aware Two-Hand Tokenization (RAT) to embed positional relation information into hand tokens, enabling a single network to process both single-hand and two-hand inputs while explicitly using their spatial relationship. A 4D Interaction Reasoning (FIR) module is developed to fuse hand tokens in 4D via attention and decode them into 3D meshes and relative temporal movements. The approach is claimed to be validated on benchmark datasets with superior performance on in-the-wild videos and real-world scenarios.
Significance. If the empirical results and ablations substantiate the claims, the work would provide a unified framework that explicitly models hand interactions, addressing limitations of prior methods that lack unified handling of input types or neglect positional relationships, with potential impact on interactive hand reconstruction tasks.
minor comments (1)
- [Abstract] Abstract: the abstract asserts validation on benchmarks and superior performance on in-the-wild videos yet supplies no quantitative numbers, error bars, ablation results, or derivation details, which prevents verification of whether the data support the central claim about RAT and FIR.
Simulated Author's Rebuttal
We thank the referee for their summary of OmniHands and for noting its potential significance as a unified framework for interactive hand reconstruction, conditional on the empirical results. We observe that the recommendation is 'uncertain' and that no specific major comments were listed under the MAJOR COMMENTS section. We therefore provide no point-by-point responses but remain available to supply additional details on the experiments, ablations, or any other aspect of the manuscript to address the uncertainty regarding substantiation of the claims.
Circularity Check
No significant circularity detected
full rationale
The paper presents a new transformer-based architecture (OmniHands) with a proposed Relation-aware Two-Hand Tokenization (RAT) module and 4D Interaction Reasoning (FIR) module as design choices for handling single- and two-hand inputs. No equations, fitted parameters, or predictions are shown that reduce by construction to the inputs; the central claims rest on the architectural description and empirical validation on benchmarks rather than self-referential definitions or load-bearing self-citations. The derivation chain is self-contained as a proposed method without the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
-
[2]
Pushing the envelope for rgb-based dense 3d hand pose estimation via neural rendering
Seungryul Baek, Kwang In Kim, and Tae-Kyun Kim. Pushing the envelope for rgb-based dense 3d hand pose estimation via neural rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019
work page 2019
-
[3]
3d hand shape and pose from images in the wild
Adnane Boukhayma, Rodrigo de Bem, and Philip HS Torr. 3d hand shape and pose from images in the wild. In CVPR, 2019
work page 2019
-
[4]
Narang, Karl Van Wyk , Umar Iqbal, Stan Birchfield, Jan Kautz, and Dieter Fox
Yu-Wei Chao, Wei Yang, Yu Xiang, Pavlo Molchanov, Ankur Handa, Jonathan Tremblay, Yashraj S. Narang, Karl Van Wyk , Umar Iqbal, Stan Birchfield, Jan Kautz, and Dieter Fox. DexYCB : A benchmark for capturing hand grasping of objects. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
work page 2021
-
[5]
Camera-space hand mesh recovery via semantic aggregationand adaptive 2d-1d registration
Xingyu Chen, Yufeng Liu, Chongyang Ma, Jianlong Chang, Huayan Wang, Tian Chen, Xiaoyan Guo, Pengfei Wan, and Wen Zheng. Camera-space hand mesh recovery via semantic aggregationand adaptive 2d-1d registration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
work page 2021
-
[6]
Mobrecon: Mobile-friendly hand mesh reconstruction from monocular image
Xingyu Chen, Yufeng Liu, Dong Yajiao, Xiong Zhang, Chongyang Ma, Yanmin Xiong, Yuan Zhang, and Xiaoyan Guo. Mobrecon: Mobile-friendly hand mesh reconstruction from monocular image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
work page 2022
-
[7]
Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose
Hongsuk Choi, Gyeongsik Moon, and Kyoung Mu Lee. Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose. In European Conference on Computer Vision (ECCV), 2020
work page 2020
-
[8]
Beyond static features for temporally consistent 3d human pose and shape from a video
Hongsuk Choi, Gyeongsik Moon, Ju Yong Chang, and Kyoung Mu Lee. Beyond static features for temporally consistent 3d human pose and shape from a video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1964--1973, 2021
work page 1964
-
[9]
Vasileios Choutas, Georgios Pavlakos, Timo Bolkart, Dimitrios Tzionas, and Michael J. Black. Monocular expressive body regression through body-driven attention. In European Conference on Computer Vision (ECCV), 2020
work page 2020
-
[10]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[11]
Arctic: A dataset for dexterous bimanual hand-object manipulation
Zicong Fan, Omid Taheri, Dimitrios Tzionas, Muhammed Kocabas, Manuel Kaufmann, Michael J Black, and Otmar Hilliges. Arctic: A dataset for dexterous bimanual hand-object manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12943--12954, 2023
work page 2023
-
[12]
Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time
Hao-Shu Fang, Jiefeng Li, Hongyang Tang, Chao Xu, Haoyi Zhu, Yuliang Xiu, Yong-Lu Li, and Cewu Lu. Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022
work page 2022
-
[13]
Yao Feng, Vasileios Choutas, Timo Bolkart, Dimitrios Tzionas, and Michael J. Black. Collaborative regression of expressive bodies using moderation. In International Conference on 3D Vision (3DV), 2021
work page 2021
-
[14]
Deformer: Dynamic fusion transformer for robust hand pose estimation
Qichen Fu, Xingyu Liu, Ran Xu, Juan Carlos Niebles, and Kris M Kitani. Deformer: Dynamic fusion transformer for robust hand pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 23600--23611, 2023
work page 2023
-
[15]
3D hand shape and pose estimation from a single RGB image
Liuhao Ge, Zhou Ren, Yuncheng Li, Zehao Xue, Yingying Wang, Jianfei Cai, and Junsong Yuan. 3D hand shape and pose estimation from a single RGB image. In CVPR, 2019
work page 2019
-
[16]
Humans in 4d: Reconstructing and tracking humans with transformers
Shubham Goel, Georgios Pavlakos, Jathushan Rajasegaran, Angjoo Kanazawa, and Jitendra Malik. Humans in 4d: Reconstructing and tracking humans with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14783--14794, 2023
work page 2023
-
[17]
Honnotate: A method for 3d annotation of hand and object poses
Shreyas Hampali, Mahdi Rad, Markus Oberweger, and Vincent Lepetit. Honnotate: A method for 3d annotation of hand and object poses. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3196--3206, 2020
work page 2020
-
[18]
Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction
Yana Hasson, Bugra Tekin, Federica Bogo, Ivan Laptev, Marc Pollefeys, and Cordelia Schmid. Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In CVPR, 2020
work page 2020
-
[19]
Whole-body human pose estimation in the wild
Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, and Ping Luo. Whole-body human pose estimation in the wild. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part IX 16, pages 196--214. Springer, 2020
work page 2020
-
[20]
Total capture: A 3D deformation model for tracking faces, hands, and bodies
Hanbyul Joo, Tomas Simon, and Yaser Sheikh. Total capture: A 3D deformation model for tracking faces, hands, and bodies. In CVPR, 2018
work page 2018
-
[21]
End-to-end recovery of human shape and pose
Angjoo Kanazawa, Michael J Black, David W Jacobs, and Jitendra Malik. End-to-end recovery of human shape and pose. In CVPR, pages 7122--7131, 2018
work page 2018
-
[22]
End-to-end detection and pose estimation of two interacting hands
Dong Uk Kim, Kwang In Kim, and Seungryul Baek. End-to-end detection and pose estimation of two interacting hands. In ICCV, 2021
work page 2021
-
[23]
Vibe: Video inference for human body pose and shape estimation
Muhammed Kocabas, Nikos Athanasiou, and Michael J Black. Vibe: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5253--5263, 2020
work page 2020
-
[24]
Learning to reconstruct 3d human pose and shape via model-fitting in the loop
Nikos Kolotouros, Georgios Pavlakos, Michael J Black, and Kostas Daniilidis. Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In ICCV, pages 2252--2261, 2019
work page 2019
-
[25]
Convolutional mesh regression for single-image human shape reconstruction
Nikos Kolotouros , Georgios Pavlakos , and Kostas Daniilidis . Convolutional mesh regression for single-image human shape reconstruction. In CVPR, 2019
work page 2019
-
[26]
Weakly-supervised mesh-convolutional hand reconstruction in the wild
Dominik Kulon, Riza Alp Güler, Iasonas Kokkinos, Michael Bronstein, and Stefanos Zafeiriou. Weakly-supervised mesh-convolutional hand reconstruction in the wild. In CVPR, 2020
work page 2020
-
[27]
Renderih: A large-scale synthetic dataset for 3d interacting hand pose estimation
Lijun Li, Linrui Tian, Xindi Zhang, Qi Wang, Bang Zhang, Liefeng Bo, Mengyuan Liu, and Chen Chen. Renderih: A large-scale synthetic dataset for 3d interacting hand pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 20395--20405, 2023
work page 2023
-
[28]
Interacting attention graph for single image two-hand reconstruction
Mengcheng Li, Liang An, Hongwen Zhang, Lianpeng Wu, Feng Chen, Tao Yu, and Yebin Liu. Interacting attention graph for single image two-hand reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2761--2770, 2022
work page 2022
-
[29]
Mengcheng Li, Hongwen Zhang, Yuxiang Zhang, Ruizhi Shao, Tao Yu, and Yebin Liu. HHMR : Holistic hand mesh recovery by enhancing the multimodal controllability of graph diffusion models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2024
work page 2024
-
[30]
End-to-end human pose and mesh reconstruction with transformers
Kevin Lin , Lijuan Wang , and Zicheng Liu . End-to-end human pose and mesh reconstruction with transformers. In CVPR, 2021
work page 2021
-
[31]
Kevin Lin, Lijuan Wang, and Zicheng Liu. Mesh graphormer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12939--12948, 2021
work page 2021
-
[32]
Semi-supervised 3d hand-object poses estimation with interactions in time
Shaowei Liu, Hanwen Jiang, Jiarui Xu, Sifei Liu, and Xiaolong Wang. Semi-supervised 3d hand-object poses estimation with interactions in time. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2021
work page 2021
-
[33]
3D interacting hand pose estimation by hand de-occlusion and removal
Hao Meng, Sheng Jin, Wentao Liu, Chen Qian, Mengxiang Lin, Wanli Ouyang, and Ping Luo. 3D interacting hand pose estimation by hand de-occlusion and removal. In ECCV, pages 380--397. Springer, 2022
work page 2022
-
[34]
Bringing inputs to shared domains for 3d interacting hands recovery in the wild
Gyeongsik Moon. Bringing inputs to shared domains for 3d interacting hands recovery in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17028--17037, 2023
work page 2023
-
[35]
Gyeongsik Moon, Shoou-I Yu, He Wen, Takaaki Shiratori, and Kyoung Mu Lee. Interhand2. 6m: A dataset and baseline for 3d interacting hand pose estimation from a single rgb image. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XX 16, pages 548--564. Springer, 2020
work page 2020
-
[36]
Accurate 3d hand pose estimation for whole-body 3d human mesh estimation
Gyeongsik Moon, Hongsuk Choi, and Kyoung Mu Lee. Accurate 3d hand pose estimation for whole-body 3d human mesh estimation. In Computer Vision and Pattern Recognition Workshop (CVPRW), 2022
work page 2022
-
[37]
A dataset of relighted 3d interacting hands
Gyeongsik Moon, Shunsuke Saito, Weipeng Xu, Rohan Joshi, Julia Buffalini, Harley Bellan, Nicholas Rosen, Jesse Richardson, Mallorie Mize, Philippe De Bree, et al. A dataset of relighted 3d interacting hands. Advances in Neural Information Processing Systems, 36, 2024
work page 2024
-
[38]
Handoccnet: Occlusion-robust 3d hand mesh estimation network
JoonKyu Park, Yeonguk Oh, Gyeongsik Moon, Hongsuk Choi, and Kyoung Mu Lee. Handoccnet: Occlusion-robust 3d hand mesh estimation network. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022
work page 2022
-
[39]
Pytorch: An imperative style, high-performance deep learning library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019
work page 2019
-
[40]
Reconstructing hands in 3d with transformers
Georgios Pavlakos, Dandan Shan, Ilija Radosavovic, Angjoo Kanazawa, David Fouhey, and Jitendra Malik. Reconstructing hands in 3d with transformers. arXiv preprint arXiv:2312.05251, 2023
-
[41]
Pengfei Ren, Chao Wen, Xiaozheng Zheng, Zhou Xue, Haifeng Sun, Qi Qi, Jingyu Wang, and Jianxin Liao. Decoupled iterative refinement framework for interacting hands reconstruction from a single rgb image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8014--8025, 2023
work page 2023
-
[42]
Javier Romero, Dimitrios Tzionas, and Michael J. Black. Embodied hands: Modeling and capturing hands and bodies together. In SIGGRAPH Asia, 2017
work page 2017
-
[43]
Frankmocap: Fast monocular 3D hand and body motion capture by regression and integration
Yu Rong, Takaaki Shiratori, and Hanbyul Joo. Frankmocap: Fast monocular 3D hand and body motion capture by regression and integration. In ICCVW, 2021
work page 2021
-
[44]
Hand keypoint detection in single images using multiview bootstrapping
Tomas Simon, Hanbyul Joo, Iain Matthews, and Yaser Sheikh. Hand keypoint detection in single images using multiview bootstrapping. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1145--1153, 2017
work page 2017
-
[45]
Towards accurate alignment in real-time 3D hand-mesh reconstruction
Xiao Tang , Tianyu Wang , and Chi-Wing Fu . Towards accurate alignment in real-time 3D hand-mesh reconstruction. In ICCV, 2021
work page 2021
-
[46]
Recovering 3D human mesh from monocular images: A survey
Yating Tian, Hongwen Zhang, Yebin Liu, and Limin Wang. Recovering 3D human mesh from monocular images: A survey. IEEE transactions on pattern analysis and machine intelligence, 2023
work page 2023
-
[47]
Consistent 3d hand reconstruction in video via self-supervised learning
Zhigang Tu, Zhisheng Huang, Yujin Chen, Di Kang, Linchao Bao, Bisheng Yang, and Junsong Yuan. Consistent 3d hand reconstruction in video via self-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023
work page 2023
-
[48]
Capturing hands in action using discriminative salient points and physics simulation
Dimitrios Tzionas, Luca Ballan, Abhilash Srikantha, Pablo Aponte, Marc Pollefeys, and Juergen Gall. Capturing hands in action using discriminative salient points and physics simulation. International Journal of Computer Vision, 118: 0 172--193, 2016
work page 2016
-
[49]
Gomez, ukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, page 6000–6010, Red Hook, NY, USA, 2017. Curran Associates Inc
work page 2017
-
[50]
Memahand: Exploiting mesh-mano interaction for single image two-hand reconstruction
Congyi Wang, Feida Zhu, and Shilei Wen. Memahand: Exploiting mesh-mano interaction for single image two-hand reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 564--573, 2023
work page 2023
-
[51]
Monocular total capture: Posing face, body, and hands in the wild
Donglai Xiang, Hanbyul Joo, and Yaser Sheikh. Monocular total capture: Posing face, body, and hands in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10965--10974, 2019
work page 2019
-
[52]
Vitpose: Simple vision transformer baselines for human pose estimation
Yufei Xu, Jing Zhang, Qiming Zhang, and Dacheng Tao. Vitpose: Simple vision transformer baselines for human pose estimation. Advances in Neural Information Processing Systems, 35: 0 38571--38584, 2022
work page 2022
-
[53]
Seqhand: Rgb-sequence-based 3d hand pose and shape estimation
John Yang, Hyung Jin Chang, Seungeui Lee, and Nojun Kwak. Seqhand: Rgb-sequence-based 3d hand pose and shape estimation. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XII 16, pages 122--139. Springer, 2020
work page 2020
-
[54]
Acr: Attention collaboration-based regressor for arbitrary two-hand reconstruction
Zhengdi Yu, Shaoli Huang, Chen Fang, Toby P Breckon, and Jue Wang. Acr: Attention collaboration-based regressor for arbitrary two-hand reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12955--12964, 2023
work page 2023
-
[55]
Interacting two-hand 3d pose and shape reconstruction from single color image
Baowen Zhang, Yangang Wang, Xiaoming Deng, Yinda Zhang, Ping Tan, Cuixia Ma, and Hongan Wang. Interacting two-hand 3d pose and shape reconstruction from single color image. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11354--11363, 2021
work page 2021
-
[56]
PyMAF : 3D human pose and shape regression with pyramidal mesh alignment feedback loop
Hongwen Zhang , Yating Tian , Xinchi Zhou , Wanli Ouyang , Yebin Liu , Limin Wang , and Zhenan Sun . PyMAF : 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In ICCV, 2021
work page 2021
-
[57]
Learning 3D human shape and pose from dense body parts
Hongwen Zhang, Jie Cao, Guo Lu, Wanli Ouyang, and Zhenan Sun. Learning 3D human shape and pose from dense body parts. TPAMI, 44 0 (5): 0 2610--2627, 2022
work page 2022
-
[58]
PyMAF-X : Towards well-aligned full-body model regression from monocular images
Hongwen Zhang, Yating Tian, Yuxiang Zhang, Mengcheng Li, Liang An, Zhenan Sun, and Yebin Liu. PyMAF-X : Towards well-aligned full-body model regression from monocular images. TPAMI, 2023
work page 2023
-
[59]
End-to-end hand mesh recovery from a monocular rgb image
Xiong Zhang, Qiang Li, Hong Mo, Wenbo Zhang, and Wen Zheng. End-to-end hand mesh recovery from a monocular rgb image. In ICCV, 2019
work page 2019
-
[60]
Light-weight multi-person total capture using sparse multi-view cameras
Yuxiang Zhang, Zhe Li, Liang An, Mengcheng Li, Tao Yu, and Yebin Liu. Light-weight multi-person total capture using sparse multi-view cameras. In ICCV, 2021
work page 2021
-
[61]
Exploiting spatial-temporal context for interacting hand reconstruction on monocular rgb video
Weichao Zhao, Hezhen Hu, Wengang Zhou, Li Li, and Houqiang Li. Exploiting spatial-temporal context for interacting hand reconstruction on monocular rgb video. ACM Transactions on Multimedia Computing, Communications and Applications, 20 0 (6): 0 1--18, 2024
work page 2024
-
[62]
Learning to estimate 3d hand pose from single rgb images
Christian Zimmermann and Thomas Brox. Learning to estimate 3d hand pose from single rgb images. In Proceedings of the IEEE international conference on computer vision, pages 4903--4911, 2017
work page 2017
-
[63]
Christian Zimmermann, Duygu Ceylan, Jimei Yang, Bryan Russell, Max J. Argus, and Thomas Brox. Freihand: A dataset for markerless capture of hand pose and shape from single rgb images. In ICCV, 2019
work page 2019
-
[64]
Reconstructing interacting hands with interaction prior from monocular images
Binghui Zuo, Zimeng Zhao, Wenqian Sun, Wei Xie, Zhou Xue, and Yangang Wang. Reconstructing interacting hands with interaction prior from monocular images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9054--9064, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.