3D human pose estimation in video with temporal convolutions and semi-supervised training

· 2018 · cs.CV · arXiv 1811.11742

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints. We also introduce back-projection, a simple and effective semi-supervised training method that leverages unlabeled video data. We start with predicted 2D keypoints for unlabeled video, then estimate 3D poses and finally back-project to the input 2D keypoints. In the supervised setting, our fully-convolutional model outperforms the previous best result from the literature by 6 mm mean per-joint position error on Human3.6M, corresponding to an error reduction of 11%, and the model also shows significant improvements on HumanEva-I. Moreover, experiments with back-projection show that it comfortably outperforms previous state-of-the-art results in semi-supervised settings where labeled data is scarce. Code and models are available at https://github.com/facebookresearch/VideoPose3D

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Velocity and stroke rate reconstruction of canoe sprint team boats based on panned and zoomed video recordings

cs.CV · 2026-02-26 · conditional · novelty 6.0

A YOLOv8 and homography-based system reconstructs canoe boat velocity with MAPE 0.011 and stroke rate with MAPE 0.009 from video, matching GPS closely.

Dual-stream Spatio-Temporal GCN-Transformer Network for 3D Human Pose Estimation

cs.CV · 2026-04-20 · unverdicted · novelty 5.0

MixTGFormer reports state-of-the-art 3D pose estimation errors of 37.6 mm on Human3.6M and 15.7 mm on MPI-INF-3DHP by using parallel GCN-Transformer streams with SE layers for local-global feature fusion.

A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera

cs.CV · 2019-07-16 · unverdicted · novelty 3.0

A multitask framework lifts 2D keypoints to 3D poses via a two-stream network then applies ENAS to model spatio-temporal pose evolution for action recognition on Human3.6M, MSR Action3D and SBU datasets.

citing papers explorer

Showing 3 of 3 citing papers.

Velocity and stroke rate reconstruction of canoe sprint team boats based on panned and zoomed video recordings cs.CV · 2026-02-26 · conditional · none · ref 13 · internal anchor
A YOLOv8 and homography-based system reconstructs canoe boat velocity with MAPE 0.011 and stroke rate with MAPE 0.009 from video, matching GPS closely.
Dual-stream Spatio-Temporal GCN-Transformer Network for 3D Human Pose Estimation cs.CV · 2026-04-20 · unverdicted · none · ref 35
MixTGFormer reports state-of-the-art 3D pose estimation errors of 37.6 mm on Human3.6M and 15.7 mm on MPI-INF-3DHP by using parallel GCN-Transformer streams with SE layers for local-global feature fusion.
A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera cs.CV · 2019-07-16 · unverdicted · none · ref 27 · internal anchor
A multitask framework lifts 2D keypoints to 3D poses via a two-stream network then applies ENAS to model spatio-temporal pose evolution for action recognition on Human3.6M, MSR Action3D and SBU datasets.

3D human pose estimation in video with temporal convolutions and semi-supervised training

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer