pith. machine review for the scientific record. sign in

arxiv: 2604.17959 · v1 · submitted 2026-04-20 · 💻 cs.CV · cs.GR

Recognition: unknown

Chatting about Upper-Body Expressive Human Pose and Shape Estimation

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:32 UTC · model grok-4.3

classification 💻 cs.CV cs.GR
keywords upper-bodyexpressive pose estimationshape estimationtransformercross-dependencygeneralizationAR/VR
0
0 comments X

The pith

CoEvoer uses explicit cross-part feature exchanges in a transformer to improve upper-body pose and shape estimates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CoEvoer as a one-stage framework that lets different upper-body regions interact directly at the feature level. Larger, easier-to-estimate areas such as the torso supply global context and position cues to smaller, harder regions like the face and hands, while the finer details from those regions in turn calibrate the surrounding parts. This mutual refinement is achieved through a synergistic cross-dependency transformer that jointly regresses all parameters. The authors show the approach reaches state-of-the-art accuracy on standard upper-body benchmarks and maintains performance on previously unseen wild images. The work is presented as the first method built specifically around the semantic couplings among face, hands, and torso for expressive estimation.

Core claim

CoEvoer is the first framework designed for upper-body expressive human pose and shape estimation; it employs a synergistic cross-dependency transformer that enables explicit feature-level interactions so that global semantics and positional priors from the torso guide the face and hands while localized details from the face and hands refine adjacent body parts, producing joint parameter regression that improves both benchmark accuracy and generalization to wild images.

What carries the argument

The synergistic cross-dependency transformer, which performs explicit feature-level interactions so that larger regions supply global guidance and finer regions supply calibration to neighboring parts.

If this is right

  • More accurate joint estimation of facial, hand, and torso parameters than methods that treat regions independently.
  • Stronger performance on images from unconstrained environments without additional training data.
  • A single-stage pipeline that captures semantic dependencies among upper-body parts instead of sequential or separate processing.
  • Direct applicability to AR/VR tasks that require expressive upper-body reconstruction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same cross-part guidance pattern could be tested on full-body models to see whether torso-to-limb and limb-to-torso exchanges remain beneficial.
  • If the method proves robust across body shapes and clothing, it could support real-time applications such as live performance capture.
  • The explicit dependency structure offers a template for other vision tasks where coarse context and fine details must interact, such as scene parsing or object part segmentation.

Load-bearing premise

That exchanging contextual features between torso, face, and hands will reliably improve estimates rather than introduce conflicting signals.

What would settle it

Training the same architecture without the cross-dependency module and finding equal or higher accuracy on the same upper-body benchmarks plus comparable results on wild images would show the interactions are not required.

Figures

Figures reproduced from arXiv: 2604.17959 by Huan Zhao, Liu Wang, Wei Huang, Yujie Song, Yuxiang Zhao.

Figure 1
Figure 1. Figure 1: Our proposed CoEvoer achieves the mutual adaptation among different body parts through the mutual complemen [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of existing mesh recovery methods and ours: multi-stage frameworks use part-specific experts (face [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative visualization comparison on the UBody dataset. Each row shows, from left to right: the input image, [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of facial keypoint estimation results. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of mesh recovery results. The first col [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
read the original abstract

Expressive Human Pose and Shape Estimation (EHPS) plays a crucial role in various AR/VR applications and has witnessed significant progress in recent years. However, current state-of-the-art methods still struggle with accurate parameter estimation for facial and hand regions and exhibit limited generalization to wild images. To address these challenges, we present CoEvoer, a novel one-stage synergistic cross-dependency transformer framework tailored for upper-body EHPS. CoEvoer enables explicit feature-level interaction across different body parts, allowing for mutual enhancement through contextual information exchange. Specifically, larger and more easily estimated regions such as the torso provide global semantics and positional priors to guide the estimation of finer, more complex regions like the face and hands. Conversely, the localized details captured in facial and hand regions help refine and calibrate adjacent body parts. To the best of our knowledge, CoEvoer is the first framework designed specifically for upper-body EHPS, with the goal of capturing the strong coupling and semantic dependencies among the face, hands, and torso through joint parameter regression. Extensive experiments demonstrate that CoEvoer achieves state-of-the-art performance on upper-body benchmarks and exhibits strong generalization capability even on unseen wild images.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces CoEvoer, a one-stage synergistic cross-dependency transformer framework for upper-body expressive human pose and shape estimation (EHPS). It models explicit feature-level interactions so that torso regions supply global semantics and positional priors to guide face and hand estimation while localized details from the latter refine adjacent parts, with the goal of joint parameter regression that captures coupling among these regions. The paper asserts that this yields state-of-the-art performance on upper-body benchmarks and strong generalization to unseen wild images.

Significance. If the central mechanism is validated, the work could advance EHPS by exploiting inter-part dependencies that current methods handle poorly for fine regions, offering a targeted one-stage alternative for upper-body applications in AR/VR. The emphasis on synergistic cross-dependency is a clear conceptual contribution, but its empirical grounding remains unverified.

major comments (2)
  1. [Experiments] Experiments section: no ablation is reported that removes the cross-attention modules (while retaining the rest of the architecture, backbone, and training schedule) to isolate whether observed gains on upper-body benchmarks arise from the proposed feature-level cross-dependency rather than increased capacity or other design choices.
  2. [Abstract and Experiments] Abstract and Experiments: generalization to wild images is asserted but appears supported only by qualitative examples; no quantitative metrics (e.g., error rates or success rates on held-out in-the-wild test sets) are provided to substantiate the claim that the cross-dependency improves robustness when torso cues are noisy.
minor comments (2)
  1. [Introduction] The claim that CoEvoer is 'the first framework designed specifically for upper-body EHPS' should be accompanied by a more explicit comparison table or discussion in the related-work section to distinguish it from prior full-body or part-specific methods.
  2. [Method] Notation for the cross-dependency modules (e.g., how torso-to-face and face-to-torso attention are formulated) could be clarified with a single equation or diagram reference to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and insightful comments. We address each major comment point-by-point below, indicating the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: no ablation is reported that removes the cross-attention modules (while retaining the rest of the architecture, backbone, and training schedule) to isolate whether observed gains on upper-body benchmarks arise from the proposed feature-level cross-dependency rather than increased capacity or other design choices.

    Authors: We agree that an ablation isolating the cross-attention modules is necessary to confirm that performance gains derive from the cross-dependency mechanism rather than added capacity. In the revised manuscript we will add this experiment: the cross-attention modules will be removed while retaining the identical backbone, remaining architecture, and training schedule, with results reported on the same upper-body benchmarks to quantify the contribution. revision: yes

  2. Referee: [Abstract and Experiments] Abstract and Experiments: generalization to wild images is asserted but appears supported only by qualitative examples; no quantitative metrics (e.g., error rates or success rates on held-out in-the-wild test sets) are provided to substantiate the claim that the cross-dependency improves robustness when torso cues are noisy.

    Authors: We acknowledge that quantitative metrics on held-out in-the-wild sets would provide stronger substantiation. Our current evidence consists of qualitative results across diverse unseen wild images that illustrate improved robustness, including cases with noisy torso cues. In revision we will update the abstract and experiments section to qualify the generalization claims more precisely, expand the qualitative analysis with additional challenging examples, and discuss the practical difficulties of obtaining ground-truth annotations for such data. revision: partial

standing simulated objections not resolved
  • Quantitative metrics (error rates or success rates) on held-out in-the-wild test sets, because no suitable annotated benchmarks exist for upper-body expressive human pose and shape estimation in unconstrained wild scenarios.

Circularity Check

0 steps flagged

No significant circularity; architecture and claims are independent of inputs.

full rationale

The paper introduces CoEvoer as a novel one-stage cross-dependency transformer for upper-body EHPS, with the mechanism of torso priors guiding face/hands (and vice versa) presented as an explicit architectural design choice rather than a fitted parameter or self-referential definition. Performance claims rest on described experiments and benchmarks without any quoted reduction of results to the method's own inputs by construction. No self-citation load-bearing steps, uniqueness theorems from authors, or ansatz smuggling appear in the abstract or description. The derivation chain for the synergistic framework is self-contained as a proposed model, consistent with the absence of any self-definitional or fitted-input-called-prediction patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the effectiveness of the proposed transformer architecture and the assumption of beneficial cross-part interactions, with no specific free parameters detailed in the abstract.

axioms (1)
  • domain assumption Body parts in upper-body images have strong semantic dependencies that can be exploited for mutual improvement in estimation.
    Central to the cross-dependency design.
invented entities (1)
  • CoEvoer no independent evidence
    purpose: A one-stage synergistic cross-dependency transformer for EHPS
    Newly proposed model without independent validation beyond claimed experiments.

pith-pipeline@v0.9.0 · 5515 in / 1363 out tokens · 55775 ms · 2026-05-10T05:32:46.464567+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

69 extracted references · 5 canonical work pages · 2 internal anchors

  1. [1]

    Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education

    Clancey, William J. Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83)

  2. [2]

    Classification Problem Solving

    Clancey, William J. Classification Problem Solving. Proceedings of the Fourth National Conference on Artificial Intelligence

  3. [3]

    , title =

    Robinson, Arthur L. , title =. 1980 , doi =. https://science.sciencemag.org/content/208/4447/1019.full.pdf , journal =

  4. [4]

    New Ways to Make Microcircuits Smaller---Duplicate Entry

    Robinson, Arthur L. New Ways to Make Microcircuits Smaller---Duplicate Entry. Science

  5. [5]

    Clancey and Glenn Rennels , abstract =

    Diane Warner Hasling and William J. Clancey and Glenn Rennels , abstract =. Strategic explanations for a diagnostic consultation system , journal =. 1984 , issn =. doi:https://doi.org/10.1016/S0020-7373(84)80003-6 , url =

  6. [6]

    and Rennels, Glenn R

    Hasling, Diane Warner and Clancey, William J. and Rennels, Glenn R. and Test, Thomas. Strategic Explanations in Consultation---Duplicate. The International Journal of Man-Machine Studies

  7. [7]

    Poligon: A System for Parallel Problem Solving

    Rice, James. Poligon: A System for Parallel Problem Solving

  8. [8]

    Transfer of Rule-Based Expertise through a Tutorial Dialogue

    Clancey, William J. Transfer of Rule-Based Expertise through a Tutorial Dialogue

  9. [9]

    The Engineering of Qualitative Models

    Clancey, William J. The Engineering of Qualitative Models

  10. [10]

    2017 , eprint=

    Attention Is All You Need , author=. 2017 , eprint=

  11. [11]

    Pluto: The 'Other' Red Planet

    NASA. Pluto: The 'Other' Red Planet

  12. [12]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    One-stage 3d whole-body mesh recovery with component aware transformer , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  13. [13]

    Advances in Neural Information Processing Systems , volume=

    Smpler-x: Scaling up expressive human pose and shape estimation , author=. Advances in Neural Information Processing Systems , volume=

  14. [14]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Aios: All-in-one-stage expressive human pose and shape estimation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  15. [15]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Accurate 3D hand pose estimation for whole-body 3D human mesh estimation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  16. [16]

    Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part X 16 , pages=

    Monocular expressive body regression through body-driven attention , author=. Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part X 16 , pages=. 2020 , organization=

  17. [17]

    2021 International Conference on 3D Vision (3DV) , pages=

    Collaborative regression of expressive bodies using moderation , author=. 2021 International Conference on 3D Vision (3DV) , pages=. 2021 , organization=

  18. [18]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Frankmocap: A monocular 3d whole-body pose estimation system via regression and integration , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  19. [19]

    Neurocomputing , pages=

    Deep learning for 3d human pose estimation and mesh recovery: A survey , author=. Neurocomputing , pages=. 2024 , publisher=

  20. [20]

    IEEE transactions on pattern analysis and machine intelligence , volume=

    Recovering 3d human mesh from monocular images: A survey , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2023 , publisher=

  21. [21]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Expressive body capture: 3d hands, face, and body from a single image , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  22. [22]

    European Conference on Computer Vision , pages=

    Multi-hmr: Multi-person whole-body human mesh recovery in a single shot , author=. European Conference on Computer Vision , pages=. 2024 , organization=

  23. [23]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    An image is worth 16x16 words: Transformers for image recognition at scale , author=. arXiv preprint arXiv:2010.11929 , year=

  24. [24]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Strip pooling: Rethinking spatial pooling for scene parsing , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  25. [25]

    Advances in neural information processing systems , volume=

    Segnext: Rethinking convolutional attention design for semantic segmentation , author=. Advances in neural information processing systems , volume=

  26. [26]

    European Conference on Computer Vision , pages=

    Context-guided spatial feature reconstruction for efficient semantic segmentation , author=. European Conference on Computer Vision , pages=. 2024 , organization=

  27. [27]

    Computer vision--ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13 , pages=

    Microsoft coco: Common objects in context , author=. Computer vision--ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13 , pages=. 2014 , organization=

  28. [28]

    6m: Large scale datasets and predictive methods for 3d human sensing in natural environments , author=

    Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2013 , publisher=

  29. [29]

    Proceedings of the IEEE Conference on computer Vision and Pattern Recognition , pages=

    2d human pose estimation: New benchmark and state of the art analysis , author=. Proceedings of the IEEE Conference on computer Vision and Pattern Recognition , pages=

  30. [30]

    2021 International Conference on 3D Vision (3DV) , pages=

    Exemplar fine-tuning for 3d human model fitting towards in-the-wild 3d human pose estimation , author=. 2021 International Conference on 3D Vision (3DV) , pages=. 2021 , organization=

  31. [31]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Neuralannot: Neural annotator for 3d human mesh training sets , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  32. [32]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    AGORA: Avatars in geography optimized for regression analysis , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  33. [33]

    Proceedings of the IEEE international conference on computer vision , pages=

    Mask r-cnn , author=. Proceedings of the IEEE international conference on computer vision , pages=

  34. [34]

    European conference on computer vision , pages=

    Exploring plain vision transformer backbones for object detection , author=. European conference on computer vision , pages=. 2022 , organization=

  35. [35]

    Deformable DETR: Deformable Transformers for End-to-End Object Detection

    Deformable detr: Deformable transformers for end-to-end object detection , author=. arXiv preprint arXiv:2010.04159 , year=

  36. [36]

    IEEE transactions on pattern analysis and machine intelligence , volume=

    Inverse rendering of faces with a 3D morphable model , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2012 , publisher=

  37. [37]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops , pages=

    Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops , pages=

  38. [38]

    ACM Transactions on Graphics (ToG) , volume=

    3d morphable face models—past, present, and future , author=. ACM Transactions on Graphics (ToG) , volume=. 2020 , publisher=

  39. [39]

    Proceedings of the IEEE international conference on computer vision workshops , pages=

    Mofa: Model-based deep convolutional face autoencoder for unsupervised monocular reconstruction , author=. Proceedings of the IEEE international conference on computer vision workshops , pages=

  40. [40]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    3d hand shape and pose from images in the wild , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  41. [41]

    Applied Sciences , volume=

    A comprehensive study on deep learning-based 3D hand pose estimation methods , author=. Applied Sciences , volume=. 2020 , publisher=

  42. [42]

    Virtual Reality & Intelligent Hardware , volume=

    Survey on depth and RGB image-based 3D hand shape and pose estimation , author=. Virtual Reality & Intelligent Hardware , volume=. 2021 , publisher=

  43. [43]

    Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part VII 16 , pages=

    Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose , author=. Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part VII 16 , pages=. 2020 , organization=

  44. [44]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    End-to-end recovery of human shape and pose , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  45. [45]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Learning to reconstruct 3D human pose and shape via model-fitting in the loop , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  46. [46]

    European Conference on Computer Vision , pages=

    Deciwatch: A simple baseline for 10 efficient 2d and 3d pose estimation , author=. European Conference on Computer Vision , pages=. 2022 , organization=

  47. [47]

    European Conference on Computer Vision , pages=

    Smoothnet: A plug-and-play network for refining human poses in videos , author=. European Conference on Computer Vision , pages=. 2022 , organization=

  48. [48]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Pymaf: 3d human pose and shape regression with pyramidal mesh alignment feedback loop , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  49. [49]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

    Pymaf-x: Towards well-aligned full-body model regression from monocular images , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2023 , publisher=

  50. [50]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    PARE: Part attention regressor for 3D human body estimation , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  51. [51]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    SPEC: Seeing people in the wild with an estimated camera , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  52. [52]

    Proceedings of the 32nd ACM International Conference on Multimedia , pages=

    HMR-Adapter: A Lightweight Adapter with Dual-Path Cross Augmentation for Expressive Human Mesh Recovery , author=. Proceedings of the 32nd ACM International Conference on Multimedia , pages=

  53. [53]

    Advances in Neural Information Processing Systems , volume=

    Towards robust and expressive whole-body human pose and shape estimation , author=. Advances in Neural Information Processing Systems , volume=

  54. [54]

    European Conference on Computer Vision , pages=

    Humman: Multi-modal 4d human dataset for versatile sensing and modeling , author=. European Conference on Computer Vision , pages=. 2022 , organization=

  55. [55]

    Advances in Neural Information Processing Systems , volume=

    Garment4d: Garment reconstruction from point cloud sequences , author=. Advances in Neural Information Processing Systems , volume=

  56. [56]

    Avatarclip: Zero-shot text- driven generation and animation of 3d avatars.arXiv preprint arXiv:2205.08535, 2022

    Avatarclip: Zero-shot text-driven generation and animation of 3d avatars , author=. arXiv preprint arXiv:2205.08535 , year=

  57. [57]

    IEEE transactions on pattern analysis and machine intelligence , volume=

    Motiondiffuse: Text-driven human motion generation with diffusion model , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2024 , publisher=

  58. [58]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Remodiffuse: Retrieval-augmented motion diffusion model , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  59. [59]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Vibe: Video inference for human body pose and shape estimation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  60. [60]

    European Conference on Computer Vision , pages=

    Cliff: Carrying location information in full frames into human pose and shape estimation , author=. European Conference on Computer Vision , pages=. 2022 , organization=

  61. [61]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Zolly: Zoom focal length correctly for perspective-distorted human mesh reconstruction , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  62. [62]

    arXiv preprint arXiv:2307.11074 , year=

    Learning dense uv completion for human mesh recovery , author=. arXiv preprint arXiv:2307.11074 , year=

  63. [63]

    ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=

    Learning monocular mesh recovery of multiple body parts via synthesis , author=. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2022 , organization=

  64. [64]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Monocular real-time full body capture with inter-part correlations , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  65. [65]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Monocular total capture: Posing face, body, and hands in the wild , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  66. [66]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Behave: Dataset and method for tracking human object interactions , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  67. [67]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Tore: Token reduction for efficient human mesh recovery with transformer , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  68. [68]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Hybrik: A hybrid analytical-neural inverse kinematics solution for 3d human pose and shape estimation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  69. [69]

    IEEE Transactions on Multimedia , volume=

    A local correspondence-aware hybrid cnn-gcn model for single-image human body reconstruction , author=. IEEE Transactions on Multimedia , volume=. 2022 , publisher=