pith. machine review for the scientific record. sign in

arxiv: 1805.06485 · v2 · submitted 2018-05-16 · 💻 cs.CV

Recognition: unknown

QuaterNet: A Quaternion-based Recurrent Model for Human Motion

Authors on Pith no claims yet
classification 💻 cs.CV
keywords quaternetangleerrorshumanjointrecurrentrotationsskeleton
0
0 comments X
read the original abstract

Deep learning for predicting or generating 3D human pose sequences is an active research area. Previous work regresses either joint rotations or joint positions. The former strategy is prone to error accumulation along the kinematic chain, as well as discontinuities when using Euler angle or exponential map parameterizations. The latter requires re-projection onto skeleton constraints to avoid bone stretching and invalid configurations. This work addresses both limitations. Our recurrent network, QuaterNet, represents rotations with quaternions and our loss function performs forward kinematics on a skeleton to penalize absolute position errors instead of angle errors. On short-term predictions, QuaterNet improves the state-of-the-art quantitatively. For long-term generation, our approach is qualitatively judged as realistic as recent neural strategies from the graphics literature.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. EggHand: A Multimodal Foundation Model for Egocentric Hand Pose Forecasting

    cs.CV 2026-05 unverdicted novelty 6.0

    EggHand unifies VLA action decoding with viewpoint-aware video-text encoding to forecast egocentric hand poses, achieving SOTA accuracy on EgoExo4D while remaining robust to ego-motion and controllable via language prompts.

  2. Next-Scale Autoregressive Models for Text-to-Motion Generation

    cs.CV 2026-04 unverdicted novelty 6.0

    MoScale introduces a hierarchical next-scale autoregressive framework for text-to-motion generation that achieves state-of-the-art performance by refining motions from coarse to fine temporal resolutions.