A YOLOv8 and homography-based system reconstructs canoe boat velocity with MAPE 0.011 and stroke rate with MAPE 0.009 from video, matching GPS closely.
3D human pose estimation in video with temporal convolutions and semi-supervised training
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints. We also introduce back-projection, a simple and effective semi-supervised training method that leverages unlabeled video data. We start with predicted 2D keypoints for unlabeled video, then estimate 3D poses and finally back-project to the input 2D keypoints. In the supervised setting, our fully-convolutional model outperforms the previous best result from the literature by 6 mm mean per-joint position error on Human3.6M, corresponding to an error reduction of 11%, and the model also shows significant improvements on HumanEva-I. Moreover, experiments with back-projection show that it comfortably outperforms previous state-of-the-art results in semi-supervised settings where labeled data is scarce. Code and models are available at https://github.com/facebookresearch/VideoPose3D
citation-role summary
citation-polarity summary
fields
cs.CV 3roles
background 1polarities
background 1representative citing papers
MixTGFormer reports state-of-the-art 3D pose estimation errors of 37.6 mm on Human3.6M and 15.7 mm on MPI-INF-3DHP by using parallel GCN-Transformer streams with SE layers for local-global feature fusion.
A multitask framework lifts 2D keypoints to 3D poses via a two-stream network then applies ENAS to model spatio-temporal pose evolution for action recognition on Human3.6M, MSR Action3D and SBU datasets.
citing papers explorer
-
Velocity and stroke rate reconstruction of canoe sprint team boats based on panned and zoomed video recordings
A YOLOv8 and homography-based system reconstructs canoe boat velocity with MAPE 0.011 and stroke rate with MAPE 0.009 from video, matching GPS closely.
-
Dual-stream Spatio-Temporal GCN-Transformer Network for 3D Human Pose Estimation
MixTGFormer reports state-of-the-art 3D pose estimation errors of 37.6 mm on Human3.6M and 15.7 mm on MPI-INF-3DHP by using parallel GCN-Transformer streams with SE layers for local-global feature fusion.
-
A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera
A multitask framework lifts 2D keypoints to 3D poses via a two-stream network then applies ENAS to model spatio-temporal pose evolution for action recognition on Human3.6M, MSR Action3D and SBU datasets.