pith. sign in

arxiv: 1712.01955 · v1 · pith:WPKDMEC2new · submitted 2017-12-05 · 💻 cs.CV

Learning to Forecast Videos of Human Activity with Multi-granularity Models and Adaptive Rendering

classification 💻 cs.CV
keywords humanrenderingactivityappearancecomplexmodeladaptiveappearances
0
0 comments X
read the original abstract

We propose an approach for forecasting video of complex human activity involving multiple people. Direct pixel-level prediction is too simple to handle the appearance variability in complex activities. Hence, we develop novel intermediate representations. An architecture combining a hierarchical temporal model for predicting human poses and encoder-decoder convolutional neural networks for rendering target appearances is proposed. Our hierarchical model captures interactions among people by adopting a dynamic group-based interaction mechanism. Next, our appearance rendering network encodes the targets' appearances by learning adaptive appearance filters using a fully convolutional network. Finally, these filters are placed in encoder-decoder neural networks to complete the rendering. We demonstrate that our model can generate videos that are superior to state-of-the-art methods, and can handle complex human activity scenarios in video forecasting.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.