Can 3D Pose be Learned from 2D Projections Alone?

Ambrish Tyagi; Amit Agrawal; Ching-Hang Chen; Cong Phuoc Huynh; Dylan Drover; Rohith MV

arxiv: 1808.07182 · v1 · pith:3EN45ZRBnew · submitted 2018-08-22 · 💻 cs.CV

Can 3D Pose be Learned from 2D Projections Alone?

Dylan Drover , Rohith MV , Ching-Hang Chen , Amit Agrawal , Ambrish Tyagi , Cong Phuoc Huynh This is my paper

classification 💻 cs.CV

keywords poseapproachdiscriminatorsupervisedestimationgeneratedgeneratorgiven

0 comments

read the original abstract

3D pose estimation from a single image is a challenging task in computer vision. We present a weakly supervised approach to estimate 3D pose points, given only 2D pose landmarks. Our method does not require correspondences between 2D and 3D points to build explicit 3D priors. We utilize an adversarial framework to impose a prior on the 3D structure, learned solely from their random 2D projections. Given a set of 2D pose landmarks, the generator network hypothesizes their depths to obtain a 3D skeleton. We propose a novel Random Projection layer, which randomly projects the generated 3D skeleton and sends the resulting 2D pose to the discriminator. The discriminator improves by discriminating between the generated poses and pose samples from a real distribution of 2D poses. Training does not require correspondence between the 2D inputs to either the generator or the discriminator. We apply our approach to the task of 3D human pose estimation. Results on Human3.6M dataset demonstrates that our approach outperforms many previous supervised and weakly supervised approaches.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

xR-EgoPose: Egocentric 3D Human Pose from an HMD Camera
cs.CV 2019-07 unverdicted novelty 7.0

A dual-branch decoder network trained on the new xR-EgoPose synthetic dataset achieves state-of-the-art egocentric 3D pose estimation from HMD fish-eye cameras and generalizes to real footage.