OmniHands: Towards Robust 4D Hand Mesh Recovery via A Versatile Transformer

Dixuan Lin; Hongwen Zhang; Mengcheng Li; Qianying Wang; Qi Yan; Wei Jing; Yebin Liu; Yuxiang Zhang

arxiv: 2405.20330 · v4 · submitted 2024-05-30 · 💻 cs.CV · cs.AI· cs.GR

OmniHands: Towards Robust 4D Hand Mesh Recovery via A Versatile Transformer

Dixuan Lin , Yuxiang Zhang , Mengcheng Li , Wei Jing , Qi Yan , Qianying Wang , Yebin Liu , Hongwen Zhang This is my paper

Pith reviewed 2026-05-24 01:20 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.GR

keywords hand mesh recovery4D reconstructiontransformertwo-hand interactionrelation-aware tokenizationmonocular inputmulti-viewhand pose

0 comments

The pith

OmniHands recovers 4D interactive hand meshes from monocular or multi-view inputs by embedding positional relations into tokens.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces OmniHands as a universal transformer architecture for recovering interactive hand meshes along with their relative movements from single or multiple camera views. It targets the absence of a single model that works across single-hand and two-hand image inputs while also ignoring the spatial layout between two hands. Relation-aware Two-Hand Tokenization embeds the positional relationship directly into the tokens so the same network can process both input types and make use of that layout. A 4D Interaction Reasoning module then fuses the tokens over time using attention before decoding them into 3D meshes and relative motion. The resulting approach shows stronger results on standard benchmarks and real-world video for reconstructing hand interactions.

Core claim

OmniHands supplies a single architecture that adapts to varied hand-image tasks through new tokenization and fusion steps. Relation-aware Two-Hand Tokenization places positional relation data inside the hand tokens, letting the network treat single-hand and two-hand cases uniformly while still using their relative positions. The 4D Interaction Reasoning module fuses these tokens across time with attention and produces 3D hand meshes together with their temporal movements, which supports accurate recovery of detailed hand interactions.

What carries the argument

Relation-aware Two-Hand Tokenization (RAT), which inserts positional relation data into hand tokens so one network can handle single-hand and two-hand cases while using their spatial layout.

Load-bearing premise

Embedding positional relation information via RAT lets a single network process both single-hand and two-hand inputs while making explicit use of their spatial relationship.

What would settle it

On a two-hand interaction benchmark, a model variant without the RAT module would need to match the full model's accuracy on relative hand position and interaction reconstruction to undermine the tokenization step.

Figures

Figures reproduced from arXiv: 2405.20330 by Dixuan Lin, Hongwen Zhang, Mengcheng Li, Qianying Wang, Qi Yan, Wei Jing, Yebin Liu, Yuxiang Zhang.

**Figure 1.** Figure 1: The proposed method, OmniHands, can robustly recover interactive hand motions and their relative movement from monocular inputs. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Overview of our OmniHands framework. OmniHands is a transformer-based network, which takes various forms of inputs and estimates two-hand [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative Comparison. We compare OmniHands with state-of-the-art methods on in-the-wild datasets ARCTIC[Fan et al [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative Results of in-the-wild hard cases. We show our model’s results on complex cross-hand interactions in realistic scenarios to demonstrate its [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: The qualitative results of OmniHands with multi-view inputs. We present the results on Interhand2.6m [Moon et al [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Ablation study on occlusion case. ’Full Model’ is our model with [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Ablation study of RAT. Two views of predicted hand meshes are [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

read the original abstract

In this paper, we introduce OmniHands, a universal approach to recovering interactive hand meshes and their relative movement from monocular or multi-view inputs. Our approach addresses two major limitations of previous methods: lacking a unified solution for handling various hand image inputs and neglecting the positional relationship of two hands within images. To overcome these challenges, we develop a universal architecture with novel tokenization and contextual feature fusion strategies, capable of adapting to a variety of tasks. Specifically, we propose a Relation-aware Two-Hand Tokenization (RAT) method to embed positional relation information into the hand tokens. In this way, our network can handle both single-hand and two-hand inputs and explicitly leverage relative hand positions, facilitating the reconstruction of intricate hand interactions in real-world scenarios. As such tokenization indicates the relative relationship of two hands, it also supports more effective feature fusion. To this end, we further develop a 4D Interaction Reasoning (FIR) module to fuse hand tokens in 4D with attention and decode them into 3D hand meshes and relative temporal movements. The efficacy of our approach is validated on several benchmark datasets. The results on in-the-wild videos and real-world scenarios demonstrate the superior performances of our approach for interactive hand reconstruction. More video results can be found on the project page: https://OmniHand.github.io.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OmniHands unifies single- and two-hand mesh recovery with RAT tokenization that embeds relative positions and a FIR module for 4D fusion, a practical incremental design whose gains are hard to judge without numbers.

read the letter

The main contribution is a single transformer that handles both single-hand and two-hand inputs by using Relation-aware Two-Hand Tokenization (RAT) to inject positional relation information directly into the tokens, then feeding those into a 4D Interaction Reasoning (FIR) module that fuses features across time and space to produce meshes plus relative movements. This directly targets the gap where prior work either treated hands separately or ignored their spatial relationship in the image. The design is concrete and follows logically from standard attention mechanisms, so the argument does not contain hidden contradictions or unstated preconditions about input cardinality. The stress-test note is correct on that point. The paper does a clean job of motivating why explicit relation encoding should help with intricate interactions in real scenes. What is new is the specific pairing of RAT with FIR rather than any fundamental new equation or training trick. The abstract claims validation on benchmarks and better in-the-wild results, but it supplies no quantitative scores, ablations, or error bars, which makes it impossible to tell how much the new components actually move performance. That is the main soft spot; everything else is proportionate to an architectural proposal. This paper is for people already working on hand mesh recovery who need a model that can switch between single- and two-hand cases without retraining. A reader looking for reusable design patterns in transformer tokenization for pose tasks would get value from the details. It deserves a serious referee because the problem is real, the architecture is described clearly enough to implement, and the motivation is sound even if the results section needs close checking.

Referee Report

0 major / 1 minor

Summary. The paper introduces OmniHands, a universal transformer-based architecture for 4D hand mesh recovery from monocular or multi-view inputs. It proposes Relation-aware Two-Hand Tokenization (RAT) to embed positional relation information into hand tokens, enabling a single network to process both single-hand and two-hand inputs while explicitly using their spatial relationship. A 4D Interaction Reasoning (FIR) module is developed to fuse hand tokens in 4D via attention and decode them into 3D meshes and relative temporal movements. The approach is claimed to be validated on benchmark datasets with superior performance on in-the-wild videos and real-world scenarios.

Significance. If the empirical results and ablations substantiate the claims, the work would provide a unified framework that explicitly models hand interactions, addressing limitations of prior methods that lack unified handling of input types or neglect positional relationships, with potential impact on interactive hand reconstruction tasks.

minor comments (1)

[Abstract] Abstract: the abstract asserts validation on benchmarks and superior performance on in-the-wild videos yet supplies no quantitative numbers, error bars, ablation results, or derivation details, which prevents verification of whether the data support the central claim about RAT and FIR.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their summary of OmniHands and for noting its potential significance as a unified framework for interactive hand reconstruction, conditional on the empirical results. We observe that the recommendation is 'uncertain' and that no specific major comments were listed under the MAJOR COMMENTS section. We therefore provide no point-by-point responses but remain available to supply additional details on the experiments, ablations, or any other aspect of the manuscript to address the uncertainty regarding substantiation of the claims.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents a new transformer-based architecture (OmniHands) with a proposed Relation-aware Two-Hand Tokenization (RAT) module and 4D Interaction Reasoning (FIR) module as design choices for handling single- and two-hand inputs. No equations, fitted parameters, or predictions are shown that reduce by construction to the inputs; the central claims rest on the architectural description and empirical validation on benchmarks rather than self-referential definitions or load-bearing self-citations. The derivation chain is self-contained as a proposed method without the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review prevents identification of specific free parameters or axioms; the approach appears to rest on standard transformer assumptions plus the two newly proposed modules.

pith-pipeline@v0.9.0 · 5797 in / 1034 out tokens · 30067 ms · 2026-05-24T01:20:06.769021+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 1 internal anchor

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[2]

Pushing the envelope for rgb-based dense 3d hand pose estimation via neural rendering

Seungryul Baek, Kwang In Kim, and Tae-Kyun Kim. Pushing the envelope for rgb-based dense 3d hand pose estimation via neural rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019

work page 2019
[3]

3d hand shape and pose from images in the wild

Adnane Boukhayma, Rodrigo de Bem, and Philip HS Torr. 3d hand shape and pose from images in the wild. In CVPR, 2019

work page 2019
[4]

Narang, Karl Van Wyk , Umar Iqbal, Stan Birchfield, Jan Kautz, and Dieter Fox

Yu-Wei Chao, Wei Yang, Yu Xiang, Pavlo Molchanov, Ankur Handa, Jonathan Tremblay, Yashraj S. Narang, Karl Van Wyk , Umar Iqbal, Stan Birchfield, Jan Kautz, and Dieter Fox. DexYCB : A benchmark for capturing hand grasping of objects. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

work page 2021
[5]

Camera-space hand mesh recovery via semantic aggregationand adaptive 2d-1d registration

Xingyu Chen, Yufeng Liu, Chongyang Ma, Jianlong Chang, Huayan Wang, Tian Chen, Xiaoyan Guo, Pengfei Wan, and Wen Zheng. Camera-space hand mesh recovery via semantic aggregationand adaptive 2d-1d registration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

work page 2021
[6]

Mobrecon: Mobile-friendly hand mesh reconstruction from monocular image

Xingyu Chen, Yufeng Liu, Dong Yajiao, Xiong Zhang, Chongyang Ma, Yanmin Xiong, Yuan Zhang, and Xiaoyan Guo. Mobrecon: Mobile-friendly hand mesh reconstruction from monocular image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

work page 2022
[7]

Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose

Hongsuk Choi, Gyeongsik Moon, and Kyoung Mu Lee. Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose. In European Conference on Computer Vision (ECCV), 2020

work page 2020
[8]

Beyond static features for temporally consistent 3d human pose and shape from a video

Hongsuk Choi, Gyeongsik Moon, Ju Yong Chang, and Kyoung Mu Lee. Beyond static features for temporally consistent 3d human pose and shape from a video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1964--1973, 2021

work page 1964
[9]

Vasileios Choutas, Georgios Pavlakos, Timo Bolkart, Dimitrios Tzionas, and Michael J. Black. Monocular expressive body regression through body-driven attention. In European Conference on Computer Vision (ECCV), 2020

work page 2020
[10]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[11]

Arctic: A dataset for dexterous bimanual hand-object manipulation

Zicong Fan, Omid Taheri, Dimitrios Tzionas, Muhammed Kocabas, Manuel Kaufmann, Michael J Black, and Otmar Hilliges. Arctic: A dataset for dexterous bimanual hand-object manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12943--12954, 2023

work page 2023
[12]

Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time

Hao-Shu Fang, Jiefeng Li, Hongyang Tang, Chao Xu, Haoyi Zhu, Yuliang Xiu, Yong-Lu Li, and Cewu Lu. Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022

work page 2022
[13]

Yao Feng, Vasileios Choutas, Timo Bolkart, Dimitrios Tzionas, and Michael J. Black. Collaborative regression of expressive bodies using moderation. In International Conference on 3D Vision (3DV), 2021

work page 2021
[14]

Deformer: Dynamic fusion transformer for robust hand pose estimation

Qichen Fu, Xingyu Liu, Ran Xu, Juan Carlos Niebles, and Kris M Kitani. Deformer: Dynamic fusion transformer for robust hand pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 23600--23611, 2023

work page 2023
[15]

3D hand shape and pose estimation from a single RGB image

Liuhao Ge, Zhou Ren, Yuncheng Li, Zehao Xue, Yingying Wang, Jianfei Cai, and Junsong Yuan. 3D hand shape and pose estimation from a single RGB image. In CVPR, 2019

work page 2019
[16]

Humans in 4d: Reconstructing and tracking humans with transformers

Shubham Goel, Georgios Pavlakos, Jathushan Rajasegaran, Angjoo Kanazawa, and Jitendra Malik. Humans in 4d: Reconstructing and tracking humans with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14783--14794, 2023

work page 2023
[17]

Honnotate: A method for 3d annotation of hand and object poses

Shreyas Hampali, Mahdi Rad, Markus Oberweger, and Vincent Lepetit. Honnotate: A method for 3d annotation of hand and object poses. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3196--3206, 2020

work page 2020
[18]

Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction

Yana Hasson, Bugra Tekin, Federica Bogo, Ivan Laptev, Marc Pollefeys, and Cordelia Schmid. Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In CVPR, 2020

work page 2020
[19]

Whole-body human pose estimation in the wild

Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, and Ping Luo. Whole-body human pose estimation in the wild. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part IX 16, pages 196--214. Springer, 2020

work page 2020
[20]

Total capture: A 3D deformation model for tracking faces, hands, and bodies

Hanbyul Joo, Tomas Simon, and Yaser Sheikh. Total capture: A 3D deformation model for tracking faces, hands, and bodies. In CVPR, 2018

work page 2018
[21]

End-to-end recovery of human shape and pose

Angjoo Kanazawa, Michael J Black, David W Jacobs, and Jitendra Malik. End-to-end recovery of human shape and pose. In CVPR, pages 7122--7131, 2018

work page 2018
[22]

End-to-end detection and pose estimation of two interacting hands

Dong Uk Kim, Kwang In Kim, and Seungryul Baek. End-to-end detection and pose estimation of two interacting hands. In ICCV, 2021

work page 2021
[23]

Vibe: Video inference for human body pose and shape estimation

Muhammed Kocabas, Nikos Athanasiou, and Michael J Black. Vibe: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5253--5263, 2020

work page 2020
[24]

Learning to reconstruct 3d human pose and shape via model-fitting in the loop

Nikos Kolotouros, Georgios Pavlakos, Michael J Black, and Kostas Daniilidis. Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In ICCV, pages 2252--2261, 2019

work page 2019
[25]

Convolutional mesh regression for single-image human shape reconstruction

Nikos Kolotouros , Georgios Pavlakos , and Kostas Daniilidis . Convolutional mesh regression for single-image human shape reconstruction. In CVPR, 2019

work page 2019
[26]

Weakly-supervised mesh-convolutional hand reconstruction in the wild

Dominik Kulon, Riza Alp Güler, Iasonas Kokkinos, Michael Bronstein, and Stefanos Zafeiriou. Weakly-supervised mesh-convolutional hand reconstruction in the wild. In CVPR, 2020

work page 2020
[27]

Renderih: A large-scale synthetic dataset for 3d interacting hand pose estimation

Lijun Li, Linrui Tian, Xindi Zhang, Qi Wang, Bang Zhang, Liefeng Bo, Mengyuan Liu, and Chen Chen. Renderih: A large-scale synthetic dataset for 3d interacting hand pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 20395--20405, 2023

work page 2023
[28]

Interacting attention graph for single image two-hand reconstruction

Mengcheng Li, Liang An, Hongwen Zhang, Lianpeng Wu, Feng Chen, Tao Yu, and Yebin Liu. Interacting attention graph for single image two-hand reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2761--2770, 2022

work page 2022
[29]

HHMR : Holistic hand mesh recovery by enhancing the multimodal controllability of graph diffusion models

Mengcheng Li, Hongwen Zhang, Yuxiang Zhang, Ruizhi Shao, Tao Yu, and Yebin Liu. HHMR : Holistic hand mesh recovery by enhancing the multimodal controllability of graph diffusion models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2024

work page 2024
[30]

End-to-end human pose and mesh reconstruction with transformers

Kevin Lin , Lijuan Wang , and Zicheng Liu . End-to-end human pose and mesh reconstruction with transformers. In CVPR, 2021

work page 2021
[31]

Mesh graphormer

Kevin Lin, Lijuan Wang, and Zicheng Liu. Mesh graphormer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12939--12948, 2021

work page 2021
[32]

Semi-supervised 3d hand-object poses estimation with interactions in time

Shaowei Liu, Hanwen Jiang, Jiarui Xu, Sifei Liu, and Xiaolong Wang. Semi-supervised 3d hand-object poses estimation with interactions in time. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2021

work page 2021
[33]

3D interacting hand pose estimation by hand de-occlusion and removal

Hao Meng, Sheng Jin, Wentao Liu, Chen Qian, Mengxiang Lin, Wanli Ouyang, and Ping Luo. 3D interacting hand pose estimation by hand de-occlusion and removal. In ECCV, pages 380--397. Springer, 2022

work page 2022
[34]

Bringing inputs to shared domains for 3d interacting hands recovery in the wild

Gyeongsik Moon. Bringing inputs to shared domains for 3d interacting hands recovery in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17028--17037, 2023

work page 2023
[35]

Interhand2

Gyeongsik Moon, Shoou-I Yu, He Wen, Takaaki Shiratori, and Kyoung Mu Lee. Interhand2. 6m: A dataset and baseline for 3d interacting hand pose estimation from a single rgb image. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XX 16, pages 548--564. Springer, 2020

work page 2020
[36]

Accurate 3d hand pose estimation for whole-body 3d human mesh estimation

Gyeongsik Moon, Hongsuk Choi, and Kyoung Mu Lee. Accurate 3d hand pose estimation for whole-body 3d human mesh estimation. In Computer Vision and Pattern Recognition Workshop (CVPRW), 2022

work page 2022
[37]

A dataset of relighted 3d interacting hands

Gyeongsik Moon, Shunsuke Saito, Weipeng Xu, Rohan Joshi, Julia Buffalini, Harley Bellan, Nicholas Rosen, Jesse Richardson, Mallorie Mize, Philippe De Bree, et al. A dataset of relighted 3d interacting hands. Advances in Neural Information Processing Systems, 36, 2024

work page 2024
[38]

Handoccnet: Occlusion-robust 3d hand mesh estimation network

JoonKyu Park, Yeonguk Oh, Gyeongsik Moon, Hongsuk Choi, and Kyoung Mu Lee. Handoccnet: Occlusion-robust 3d hand mesh estimation network. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022

work page 2022
[39]

Pytorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019

work page 2019
[40]

Reconstructing hands in 3d with transformers

Georgios Pavlakos, Dandan Shan, Ilija Radosavovic, Angjoo Kanazawa, David Fouhey, and Jitendra Malik. Reconstructing hands in 3d with transformers. arXiv preprint arXiv:2312.05251, 2023

work page arXiv 2023
[41]

Decoupled iterative refinement framework for interacting hands reconstruction from a single rgb image

Pengfei Ren, Chao Wen, Xiaozheng Zheng, Zhou Xue, Haifeng Sun, Qi Qi, Jingyu Wang, and Jianxin Liao. Decoupled iterative refinement framework for interacting hands reconstruction from a single rgb image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8014--8025, 2023

work page 2023
[42]

Javier Romero, Dimitrios Tzionas, and Michael J. Black. Embodied hands: Modeling and capturing hands and bodies together. In SIGGRAPH Asia, 2017

work page 2017
[43]

Frankmocap: Fast monocular 3D hand and body motion capture by regression and integration

Yu Rong, Takaaki Shiratori, and Hanbyul Joo. Frankmocap: Fast monocular 3D hand and body motion capture by regression and integration. In ICCVW, 2021

work page 2021
[44]

Hand keypoint detection in single images using multiview bootstrapping

Tomas Simon, Hanbyul Joo, Iain Matthews, and Yaser Sheikh. Hand keypoint detection in single images using multiview bootstrapping. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1145--1153, 2017

work page 2017
[45]

Towards accurate alignment in real-time 3D hand-mesh reconstruction

Xiao Tang , Tianyu Wang , and Chi-Wing Fu . Towards accurate alignment in real-time 3D hand-mesh reconstruction. In ICCV, 2021

work page 2021
[46]

Recovering 3D human mesh from monocular images: A survey

Yating Tian, Hongwen Zhang, Yebin Liu, and Limin Wang. Recovering 3D human mesh from monocular images: A survey. IEEE transactions on pattern analysis and machine intelligence, 2023

work page 2023
[47]

Consistent 3d hand reconstruction in video via self-supervised learning

Zhigang Tu, Zhisheng Huang, Yujin Chen, Di Kang, Linchao Bao, Bisheng Yang, and Junsong Yuan. Consistent 3d hand reconstruction in video via self-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

work page 2023
[48]

Capturing hands in action using discriminative salient points and physics simulation

Dimitrios Tzionas, Luca Ballan, Abhilash Srikantha, Pablo Aponte, Marc Pollefeys, and Juergen Gall. Capturing hands in action using discriminative salient points and physics simulation. International Journal of Computer Vision, 118: 0 172--193, 2016

work page 2016
[49]

Gomez, ukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, page 6000–6010, Red Hook, NY, USA, 2017. Curran Associates Inc

work page 2017
[50]

Memahand: Exploiting mesh-mano interaction for single image two-hand reconstruction

Congyi Wang, Feida Zhu, and Shilei Wen. Memahand: Exploiting mesh-mano interaction for single image two-hand reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 564--573, 2023

work page 2023
[51]

Monocular total capture: Posing face, body, and hands in the wild

Donglai Xiang, Hanbyul Joo, and Yaser Sheikh. Monocular total capture: Posing face, body, and hands in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10965--10974, 2019

work page 2019
[52]

Vitpose: Simple vision transformer baselines for human pose estimation

Yufei Xu, Jing Zhang, Qiming Zhang, and Dacheng Tao. Vitpose: Simple vision transformer baselines for human pose estimation. Advances in Neural Information Processing Systems, 35: 0 38571--38584, 2022

work page 2022
[53]

Seqhand: Rgb-sequence-based 3d hand pose and shape estimation

John Yang, Hyung Jin Chang, Seungeui Lee, and Nojun Kwak. Seqhand: Rgb-sequence-based 3d hand pose and shape estimation. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XII 16, pages 122--139. Springer, 2020

work page 2020
[54]

Acr: Attention collaboration-based regressor for arbitrary two-hand reconstruction

Zhengdi Yu, Shaoli Huang, Chen Fang, Toby P Breckon, and Jue Wang. Acr: Attention collaboration-based regressor for arbitrary two-hand reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12955--12964, 2023

work page 2023
[55]

Interacting two-hand 3d pose and shape reconstruction from single color image

Baowen Zhang, Yangang Wang, Xiaoming Deng, Yinda Zhang, Ping Tan, Cuixia Ma, and Hongan Wang. Interacting two-hand 3d pose and shape reconstruction from single color image. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11354--11363, 2021

work page 2021
[56]

PyMAF : 3D human pose and shape regression with pyramidal mesh alignment feedback loop

Hongwen Zhang , Yating Tian , Xinchi Zhou , Wanli Ouyang , Yebin Liu , Limin Wang , and Zhenan Sun . PyMAF : 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In ICCV, 2021

work page 2021
[57]

Learning 3D human shape and pose from dense body parts

Hongwen Zhang, Jie Cao, Guo Lu, Wanli Ouyang, and Zhenan Sun. Learning 3D human shape and pose from dense body parts. TPAMI, 44 0 (5): 0 2610--2627, 2022

work page 2022
[58]

PyMAF-X : Towards well-aligned full-body model regression from monocular images

Hongwen Zhang, Yating Tian, Yuxiang Zhang, Mengcheng Li, Liang An, Zhenan Sun, and Yebin Liu. PyMAF-X : Towards well-aligned full-body model regression from monocular images. TPAMI, 2023

work page 2023
[59]

End-to-end hand mesh recovery from a monocular rgb image

Xiong Zhang, Qiang Li, Hong Mo, Wenbo Zhang, and Wen Zheng. End-to-end hand mesh recovery from a monocular rgb image. In ICCV, 2019

work page 2019
[60]

Light-weight multi-person total capture using sparse multi-view cameras

Yuxiang Zhang, Zhe Li, Liang An, Mengcheng Li, Tao Yu, and Yebin Liu. Light-weight multi-person total capture using sparse multi-view cameras. In ICCV, 2021

work page 2021
[61]

Exploiting spatial-temporal context for interacting hand reconstruction on monocular rgb video

Weichao Zhao, Hezhen Hu, Wengang Zhou, Li Li, and Houqiang Li. Exploiting spatial-temporal context for interacting hand reconstruction on monocular rgb video. ACM Transactions on Multimedia Computing, Communications and Applications, 20 0 (6): 0 1--18, 2024

work page 2024
[62]

Learning to estimate 3d hand pose from single rgb images

Christian Zimmermann and Thomas Brox. Learning to estimate 3d hand pose from single rgb images. In Proceedings of the IEEE international conference on computer vision, pages 4903--4911, 2017

work page 2017
[63]

Argus, and Thomas Brox

Christian Zimmermann, Duygu Ceylan, Jimei Yang, Bryan Russell, Max J. Argus, and Thomas Brox. Freihand: A dataset for markerless capture of hand pose and shape from single rgb images. In ICCV, 2019

work page 2019
[64]

Reconstructing interacting hands with interaction prior from monocular images

Binghui Zuo, Zimeng Zhao, Wenqian Sun, Wei Xie, Zhou Xue, and Yangang Wang. Reconstructing interacting hands with interaction prior from monocular images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9054--9064, 2023

work page 2023

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[2] [2]

Pushing the envelope for rgb-based dense 3d hand pose estimation via neural rendering

Seungryul Baek, Kwang In Kim, and Tae-Kyun Kim. Pushing the envelope for rgb-based dense 3d hand pose estimation via neural rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019

work page 2019

[3] [3]

3d hand shape and pose from images in the wild

Adnane Boukhayma, Rodrigo de Bem, and Philip HS Torr. 3d hand shape and pose from images in the wild. In CVPR, 2019

work page 2019

[4] [4]

Narang, Karl Van Wyk , Umar Iqbal, Stan Birchfield, Jan Kautz, and Dieter Fox

Yu-Wei Chao, Wei Yang, Yu Xiang, Pavlo Molchanov, Ankur Handa, Jonathan Tremblay, Yashraj S. Narang, Karl Van Wyk , Umar Iqbal, Stan Birchfield, Jan Kautz, and Dieter Fox. DexYCB : A benchmark for capturing hand grasping of objects. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

work page 2021

[5] [5]

Camera-space hand mesh recovery via semantic aggregationand adaptive 2d-1d registration

Xingyu Chen, Yufeng Liu, Chongyang Ma, Jianlong Chang, Huayan Wang, Tian Chen, Xiaoyan Guo, Pengfei Wan, and Wen Zheng. Camera-space hand mesh recovery via semantic aggregationand adaptive 2d-1d registration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

work page 2021

[6] [6]

Mobrecon: Mobile-friendly hand mesh reconstruction from monocular image

Xingyu Chen, Yufeng Liu, Dong Yajiao, Xiong Zhang, Chongyang Ma, Yanmin Xiong, Yuan Zhang, and Xiaoyan Guo. Mobrecon: Mobile-friendly hand mesh reconstruction from monocular image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

work page 2022

[7] [7]

Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose

Hongsuk Choi, Gyeongsik Moon, and Kyoung Mu Lee. Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose. In European Conference on Computer Vision (ECCV), 2020

work page 2020

[8] [8]

Beyond static features for temporally consistent 3d human pose and shape from a video

Hongsuk Choi, Gyeongsik Moon, Ju Yong Chang, and Kyoung Mu Lee. Beyond static features for temporally consistent 3d human pose and shape from a video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1964--1973, 2021

work page 1964

[9] [9]

Vasileios Choutas, Georgios Pavlakos, Timo Bolkart, Dimitrios Tzionas, and Michael J. Black. Monocular expressive body regression through body-driven attention. In European Conference on Computer Vision (ECCV), 2020

work page 2020

[10] [10]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[11] [11]

Arctic: A dataset for dexterous bimanual hand-object manipulation

Zicong Fan, Omid Taheri, Dimitrios Tzionas, Muhammed Kocabas, Manuel Kaufmann, Michael J Black, and Otmar Hilliges. Arctic: A dataset for dexterous bimanual hand-object manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12943--12954, 2023

work page 2023

[12] [12]

Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time

Hao-Shu Fang, Jiefeng Li, Hongyang Tang, Chao Xu, Haoyi Zhu, Yuliang Xiu, Yong-Lu Li, and Cewu Lu. Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022

work page 2022

[13] [13]

Yao Feng, Vasileios Choutas, Timo Bolkart, Dimitrios Tzionas, and Michael J. Black. Collaborative regression of expressive bodies using moderation. In International Conference on 3D Vision (3DV), 2021

work page 2021

[14] [14]

Deformer: Dynamic fusion transformer for robust hand pose estimation

Qichen Fu, Xingyu Liu, Ran Xu, Juan Carlos Niebles, and Kris M Kitani. Deformer: Dynamic fusion transformer for robust hand pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 23600--23611, 2023

work page 2023

[15] [15]

3D hand shape and pose estimation from a single RGB image

Liuhao Ge, Zhou Ren, Yuncheng Li, Zehao Xue, Yingying Wang, Jianfei Cai, and Junsong Yuan. 3D hand shape and pose estimation from a single RGB image. In CVPR, 2019

work page 2019

[16] [16]

Humans in 4d: Reconstructing and tracking humans with transformers

Shubham Goel, Georgios Pavlakos, Jathushan Rajasegaran, Angjoo Kanazawa, and Jitendra Malik. Humans in 4d: Reconstructing and tracking humans with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14783--14794, 2023

work page 2023

[17] [17]

Honnotate: A method for 3d annotation of hand and object poses

Shreyas Hampali, Mahdi Rad, Markus Oberweger, and Vincent Lepetit. Honnotate: A method for 3d annotation of hand and object poses. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3196--3206, 2020

work page 2020

[18] [18]

Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction

Yana Hasson, Bugra Tekin, Federica Bogo, Ivan Laptev, Marc Pollefeys, and Cordelia Schmid. Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In CVPR, 2020

work page 2020

[19] [19]

Whole-body human pose estimation in the wild

Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, and Ping Luo. Whole-body human pose estimation in the wild. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part IX 16, pages 196--214. Springer, 2020

work page 2020

[20] [20]

Total capture: A 3D deformation model for tracking faces, hands, and bodies

Hanbyul Joo, Tomas Simon, and Yaser Sheikh. Total capture: A 3D deformation model for tracking faces, hands, and bodies. In CVPR, 2018

work page 2018

[21] [21]

End-to-end recovery of human shape and pose

Angjoo Kanazawa, Michael J Black, David W Jacobs, and Jitendra Malik. End-to-end recovery of human shape and pose. In CVPR, pages 7122--7131, 2018

work page 2018

[22] [22]

End-to-end detection and pose estimation of two interacting hands

Dong Uk Kim, Kwang In Kim, and Seungryul Baek. End-to-end detection and pose estimation of two interacting hands. In ICCV, 2021

work page 2021

[23] [23]

Vibe: Video inference for human body pose and shape estimation

Muhammed Kocabas, Nikos Athanasiou, and Michael J Black. Vibe: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5253--5263, 2020

work page 2020

[24] [24]

Learning to reconstruct 3d human pose and shape via model-fitting in the loop

Nikos Kolotouros, Georgios Pavlakos, Michael J Black, and Kostas Daniilidis. Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In ICCV, pages 2252--2261, 2019

work page 2019

[25] [25]

Convolutional mesh regression for single-image human shape reconstruction

Nikos Kolotouros , Georgios Pavlakos , and Kostas Daniilidis . Convolutional mesh regression for single-image human shape reconstruction. In CVPR, 2019

work page 2019

[26] [26]

Weakly-supervised mesh-convolutional hand reconstruction in the wild

Dominik Kulon, Riza Alp Güler, Iasonas Kokkinos, Michael Bronstein, and Stefanos Zafeiriou. Weakly-supervised mesh-convolutional hand reconstruction in the wild. In CVPR, 2020

work page 2020

[27] [27]

Renderih: A large-scale synthetic dataset for 3d interacting hand pose estimation

Lijun Li, Linrui Tian, Xindi Zhang, Qi Wang, Bang Zhang, Liefeng Bo, Mengyuan Liu, and Chen Chen. Renderih: A large-scale synthetic dataset for 3d interacting hand pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 20395--20405, 2023

work page 2023

[28] [28]

Interacting attention graph for single image two-hand reconstruction

Mengcheng Li, Liang An, Hongwen Zhang, Lianpeng Wu, Feng Chen, Tao Yu, and Yebin Liu. Interacting attention graph for single image two-hand reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2761--2770, 2022

work page 2022

[29] [29]

HHMR : Holistic hand mesh recovery by enhancing the multimodal controllability of graph diffusion models

Mengcheng Li, Hongwen Zhang, Yuxiang Zhang, Ruizhi Shao, Tao Yu, and Yebin Liu. HHMR : Holistic hand mesh recovery by enhancing the multimodal controllability of graph diffusion models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2024

work page 2024

[30] [30]

End-to-end human pose and mesh reconstruction with transformers

Kevin Lin , Lijuan Wang , and Zicheng Liu . End-to-end human pose and mesh reconstruction with transformers. In CVPR, 2021

work page 2021

[31] [31]

Mesh graphormer

Kevin Lin, Lijuan Wang, and Zicheng Liu. Mesh graphormer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12939--12948, 2021

work page 2021

[32] [32]

Semi-supervised 3d hand-object poses estimation with interactions in time

Shaowei Liu, Hanwen Jiang, Jiarui Xu, Sifei Liu, and Xiaolong Wang. Semi-supervised 3d hand-object poses estimation with interactions in time. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2021

work page 2021

[33] [33]

3D interacting hand pose estimation by hand de-occlusion and removal

Hao Meng, Sheng Jin, Wentao Liu, Chen Qian, Mengxiang Lin, Wanli Ouyang, and Ping Luo. 3D interacting hand pose estimation by hand de-occlusion and removal. In ECCV, pages 380--397. Springer, 2022

work page 2022

[34] [34]

Bringing inputs to shared domains for 3d interacting hands recovery in the wild

Gyeongsik Moon. Bringing inputs to shared domains for 3d interacting hands recovery in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17028--17037, 2023

work page 2023

[35] [35]

Interhand2

Gyeongsik Moon, Shoou-I Yu, He Wen, Takaaki Shiratori, and Kyoung Mu Lee. Interhand2. 6m: A dataset and baseline for 3d interacting hand pose estimation from a single rgb image. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XX 16, pages 548--564. Springer, 2020

work page 2020

[36] [36]

Accurate 3d hand pose estimation for whole-body 3d human mesh estimation

Gyeongsik Moon, Hongsuk Choi, and Kyoung Mu Lee. Accurate 3d hand pose estimation for whole-body 3d human mesh estimation. In Computer Vision and Pattern Recognition Workshop (CVPRW), 2022

work page 2022

[37] [37]

A dataset of relighted 3d interacting hands

Gyeongsik Moon, Shunsuke Saito, Weipeng Xu, Rohan Joshi, Julia Buffalini, Harley Bellan, Nicholas Rosen, Jesse Richardson, Mallorie Mize, Philippe De Bree, et al. A dataset of relighted 3d interacting hands. Advances in Neural Information Processing Systems, 36, 2024

work page 2024

[38] [38]

Handoccnet: Occlusion-robust 3d hand mesh estimation network

JoonKyu Park, Yeonguk Oh, Gyeongsik Moon, Hongsuk Choi, and Kyoung Mu Lee. Handoccnet: Occlusion-robust 3d hand mesh estimation network. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022

work page 2022

[39] [39]

Pytorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019

work page 2019

[40] [40]

Reconstructing hands in 3d with transformers

Georgios Pavlakos, Dandan Shan, Ilija Radosavovic, Angjoo Kanazawa, David Fouhey, and Jitendra Malik. Reconstructing hands in 3d with transformers. arXiv preprint arXiv:2312.05251, 2023

work page arXiv 2023

[41] [41]

Decoupled iterative refinement framework for interacting hands reconstruction from a single rgb image

Pengfei Ren, Chao Wen, Xiaozheng Zheng, Zhou Xue, Haifeng Sun, Qi Qi, Jingyu Wang, and Jianxin Liao. Decoupled iterative refinement framework for interacting hands reconstruction from a single rgb image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8014--8025, 2023

work page 2023

[42] [42]

Javier Romero, Dimitrios Tzionas, and Michael J. Black. Embodied hands: Modeling and capturing hands and bodies together. In SIGGRAPH Asia, 2017

work page 2017

[43] [43]

Frankmocap: Fast monocular 3D hand and body motion capture by regression and integration

Yu Rong, Takaaki Shiratori, and Hanbyul Joo. Frankmocap: Fast monocular 3D hand and body motion capture by regression and integration. In ICCVW, 2021

work page 2021

[44] [44]

Hand keypoint detection in single images using multiview bootstrapping

Tomas Simon, Hanbyul Joo, Iain Matthews, and Yaser Sheikh. Hand keypoint detection in single images using multiview bootstrapping. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1145--1153, 2017

work page 2017

[45] [45]

Towards accurate alignment in real-time 3D hand-mesh reconstruction

Xiao Tang , Tianyu Wang , and Chi-Wing Fu . Towards accurate alignment in real-time 3D hand-mesh reconstruction. In ICCV, 2021

work page 2021

[46] [46]

Recovering 3D human mesh from monocular images: A survey

Yating Tian, Hongwen Zhang, Yebin Liu, and Limin Wang. Recovering 3D human mesh from monocular images: A survey. IEEE transactions on pattern analysis and machine intelligence, 2023

work page 2023

[47] [47]

Consistent 3d hand reconstruction in video via self-supervised learning

Zhigang Tu, Zhisheng Huang, Yujin Chen, Di Kang, Linchao Bao, Bisheng Yang, and Junsong Yuan. Consistent 3d hand reconstruction in video via self-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

work page 2023

[48] [48]

Capturing hands in action using discriminative salient points and physics simulation

Dimitrios Tzionas, Luca Ballan, Abhilash Srikantha, Pablo Aponte, Marc Pollefeys, and Juergen Gall. Capturing hands in action using discriminative salient points and physics simulation. International Journal of Computer Vision, 118: 0 172--193, 2016

work page 2016

[49] [49]

Gomez, ukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, page 6000–6010, Red Hook, NY, USA, 2017. Curran Associates Inc

work page 2017

[50] [50]

Memahand: Exploiting mesh-mano interaction for single image two-hand reconstruction

Congyi Wang, Feida Zhu, and Shilei Wen. Memahand: Exploiting mesh-mano interaction for single image two-hand reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 564--573, 2023

work page 2023

[51] [51]

Monocular total capture: Posing face, body, and hands in the wild

Donglai Xiang, Hanbyul Joo, and Yaser Sheikh. Monocular total capture: Posing face, body, and hands in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10965--10974, 2019

work page 2019

[52] [52]

Vitpose: Simple vision transformer baselines for human pose estimation

Yufei Xu, Jing Zhang, Qiming Zhang, and Dacheng Tao. Vitpose: Simple vision transformer baselines for human pose estimation. Advances in Neural Information Processing Systems, 35: 0 38571--38584, 2022

work page 2022

[53] [53]

Seqhand: Rgb-sequence-based 3d hand pose and shape estimation

John Yang, Hyung Jin Chang, Seungeui Lee, and Nojun Kwak. Seqhand: Rgb-sequence-based 3d hand pose and shape estimation. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XII 16, pages 122--139. Springer, 2020

work page 2020

[54] [54]

Acr: Attention collaboration-based regressor for arbitrary two-hand reconstruction

Zhengdi Yu, Shaoli Huang, Chen Fang, Toby P Breckon, and Jue Wang. Acr: Attention collaboration-based regressor for arbitrary two-hand reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12955--12964, 2023

work page 2023

[55] [55]

Interacting two-hand 3d pose and shape reconstruction from single color image

Baowen Zhang, Yangang Wang, Xiaoming Deng, Yinda Zhang, Ping Tan, Cuixia Ma, and Hongan Wang. Interacting two-hand 3d pose and shape reconstruction from single color image. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11354--11363, 2021

work page 2021

[56] [56]

PyMAF : 3D human pose and shape regression with pyramidal mesh alignment feedback loop

Hongwen Zhang , Yating Tian , Xinchi Zhou , Wanli Ouyang , Yebin Liu , Limin Wang , and Zhenan Sun . PyMAF : 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In ICCV, 2021

work page 2021

[57] [57]

Learning 3D human shape and pose from dense body parts

Hongwen Zhang, Jie Cao, Guo Lu, Wanli Ouyang, and Zhenan Sun. Learning 3D human shape and pose from dense body parts. TPAMI, 44 0 (5): 0 2610--2627, 2022

work page 2022

[58] [58]

PyMAF-X : Towards well-aligned full-body model regression from monocular images

Hongwen Zhang, Yating Tian, Yuxiang Zhang, Mengcheng Li, Liang An, Zhenan Sun, and Yebin Liu. PyMAF-X : Towards well-aligned full-body model regression from monocular images. TPAMI, 2023

work page 2023

[59] [59]

End-to-end hand mesh recovery from a monocular rgb image

Xiong Zhang, Qiang Li, Hong Mo, Wenbo Zhang, and Wen Zheng. End-to-end hand mesh recovery from a monocular rgb image. In ICCV, 2019

work page 2019

[60] [60]

Light-weight multi-person total capture using sparse multi-view cameras

Yuxiang Zhang, Zhe Li, Liang An, Mengcheng Li, Tao Yu, and Yebin Liu. Light-weight multi-person total capture using sparse multi-view cameras. In ICCV, 2021

work page 2021

[61] [61]

Exploiting spatial-temporal context for interacting hand reconstruction on monocular rgb video

Weichao Zhao, Hezhen Hu, Wengang Zhou, Li Li, and Houqiang Li. Exploiting spatial-temporal context for interacting hand reconstruction on monocular rgb video. ACM Transactions on Multimedia Computing, Communications and Applications, 20 0 (6): 0 1--18, 2024

work page 2024

[62] [62]

Learning to estimate 3d hand pose from single rgb images

Christian Zimmermann and Thomas Brox. Learning to estimate 3d hand pose from single rgb images. In Proceedings of the IEEE international conference on computer vision, pages 4903--4911, 2017

work page 2017

[63] [63]

Argus, and Thomas Brox

Christian Zimmermann, Duygu Ceylan, Jimei Yang, Bryan Russell, Max J. Argus, and Thomas Brox. Freihand: A dataset for markerless capture of hand pose and shape from single rgb images. In ICCV, 2019

work page 2019

[64] [64]

Reconstructing interacting hands with interaction prior from monocular images

Binghui Zuo, Zimeng Zhao, Wenqian Sun, Wei Xie, Zhou Xue, and Yangang Wang. Reconstructing interacting hands with interaction prior from monocular images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9054--9064, 2023

work page 2023