Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks

Amir Sadeghian; Ian Reid; Roberto Mart\'in-Mart\'in; S. Hamid Rezatofighi; Silvio Savarese; Vineet Kosaraju

arxiv: 1907.03395 · v2 · pith:WD2J7IPVnew · submitted 2019-07-04 · 💻 cs.CV · cs.LG

Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks

Vineet Kosaraju , Amir Sadeghian , Roberto Mart\'in-Mart\'in , Ian Reid , S. Hamid Rezatofighi , Silvio Savarese This is my paper

Pith reviewed 2026-05-25 08:54 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords trajectory forecastingpedestrian predictiongraph attentiongenerative adversarial networkmultimodal trajectoriessocial interactionsBicycle-GAN

0 comments

The pith

Social-BiGAT uses graph attention and Bicycle-GAN to generate multimodal pedestrian trajectory forecasts that outperform prior methods on standard benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a generative model for predicting multiple possible future paths of pedestrians that interact socially with each other and the environment. It encodes observed positions and velocities into a graph where attention weights capture how one person's movement affects others, then employs a recurrent encoder-decoder adversarially trained with a Bicycle-GAN to map from latent noise back to full scenes. A reader would care if this approach produces more realistic and varied predictions than single-mode or non-social models, which matters for applications like autonomous driving. The framework is shown to reach state-of-the-art results against several baselines on common trajectory datasets.

Core claim

Social-BiGAT is a graph-based generative adversarial network that generates realistic, multimodal trajectory predictions by modelling social interactions via a graph attention network and forming a reversible transformation between each scene and its latent noise vector using Bicycle-GAN, achieving state-of-the-art performance on existing trajectory forecasting benchmarks.

What carries the argument

Graph attention network that learns feature representations encoding social interactions between humans, combined with a recurrent encoder-decoder trained adversarially using Bicycle-GAN's reversible latent mapping.

If this is right

Multimodal predictions become possible through the reversible latent mapping rather than producing only averaged paths.
Social interactions are explicitly modeled by attention over a graph of observed pedestrian positions and velocities.
The recurrent architecture handles the sequential nature of trajectory data while the GAN ensures realism.
State-of-the-art accuracy holds across multiple existing benchmarks compared to prior baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the approach generalizes, it could extend to forecasting trajectories of other agents such as vehicles by building similar interaction graphs.
The reversible mapping might enable sampling diverse futures conditioned on partial observations in real time.
Performance gains may depend on the quality of the observed graph structure, suggesting tests on datasets with varying crowd densities.

Load-bearing premise

That a graph attention network on observed pedestrian positions and velocities is sufficient to encode all relevant social interactions and that the Bicycle-GAN reversible latent mapping transfers effectively to trajectory data without additional scene-specific constraints.

What would settle it

Evaluation on a dataset containing interactions driven by factors outside observed positions and velocities, such as intent signals or static obstacles, would reveal if prediction accuracy drops below baselines.

Figures

Figures reproduced from arXiv: 1907.03395 by Amir Sadeghian, Ian Reid, Roberto Mart\'in-Mart\'in, S. Hamid Rezatofighi, Silvio Savarese, Vineet Kosaraju.

**Figure 1.** Figure 1: We show multimodal behavior for the blue pedestrian, who must make a decision about which direction they will take to avoid the red-green pedestrian group. are walking towards each other, several modes of behavior develop, such as moving to the left or moving to the right. Within each mode, there is also a large variance, allowing pedestrians to vary features like their speed. Prior work in trajectory fore… view at source ↗

**Figure 2.** Figure 2: Architecture for the proposal Social-BiGAT model. The model consists of a single generator, two discriminators (one at local pedestrian scale, and one at global scene scale), and a latent encoder that learns noise from scenes. The model makes use of a graph attention network (GAT) and self-attention on an image to consider the social and physical features of a scene. with both discriminators, as motivated … view at source ↗

**Figure 3.** Figure 3: Training process for the Social-BiGAT model. We teach the generator and discriminators using traditional adversarial learning techniques, with an additional L2 loss on generated samples to encourage consistency. We further train the latent encoder by ensuring it can recreate noise passed into the generator, and by making sure it mirrors a normal distribution. original latent. While the former task is accom… view at source ↗

**Figure 4.** Figure 4: Generated trajectories visualized for the S-GAN-P, Sophie, and Social-BiGAT models across four main scenes. Observed trajectories are shown as solid lines, ground truth future movements are shown as dashed lines, and generated samples are shown as contour maps. Different colors correspond to different pedestrians. a) Agressiveness b) Linearity c) Speed [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Visualization of generated trajectories (dashed lines), given observed trajectories (solid lines) for various (color-coded) pedestrians, while varying z, the noise passed into the generator. We note several modes of behavior, including avoidance versus aggressiveness (a), linearity versus curvature (b), and fast versus slow (c). 4). We draw three main conclusions from these visualizations. First, as shown … view at source ↗

read the original abstract

Predicting the future trajectories of multiple interacting agents in a scene has become an increasingly important problem for many different applications ranging from control of autonomous vehicles and social robots to security and surveillance. This problem is compounded by the presence of social interactions between humans and their physical interactions with the scene. While the existing literature has explored some of these cues, they mainly ignored the multimodal nature of each human's future trajectory. In this paper, we present Social-BiGAT, a graph-based generative adversarial network that generates realistic, multimodal trajectory predictions by better modelling the social interactions of pedestrians in a scene. Our method is based on a graph attention network (GAT) that learns reliable feature representations that encode the social interactions between humans in the scene, and a recurrent encoder-decoder architecture that is trained adversarially to predict, based on the features, the humans' paths. We explicitly account for the multimodal nature of the prediction problem by forming a reversible transformation between each scene and its latent noise vector, as in Bicycle-GAN. We show that our framework achieves state-of-the-art performance comparing it to several baselines on existing trajectory forecasting benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Social-BiGAT pairs GAT social encoding with Bicycle-GAN for multimodal trajectories and claims SOTA, but the abstract gives almost no experimental detail to evaluate the gains.

read the letter

This paper takes a graph attention network to encode interactions from observed pedestrian positions and velocities, then feeds those features into a Bicycle-GAN setup with a recurrent encoder-decoder to produce multiple future paths. The reversible latent mapping is meant to capture the multimodal nature of the problem while the GAT handles the social part. That combination is the main technical step; it is a direct application of two existing tools to the trajectory domain rather than a new theoretical framework. The approach is coherent and addresses the two issues the abstract flags as central—social interactions and multimodality—without adding extra scene geometry inputs. That keeps the model relatively clean. The adversarial training objective is a standard way to push for realistic sequences. The soft spot is the lack of any reported experimental setup, baseline definitions, metrics, or ablations in the abstract. The SOTA claim therefore rests on unshown comparisons, and it is impossible to tell whether the reported improvements come from the GAT-Bicycle-GAN pairing or from other factors. The model also builds its graph only on past trajectories and omits scene layout or obstacles, even though the introduction notes physical scene interactions as a core difficulty. If those omissions matter on the benchmarks, the gains could be narrower than claimed. The stress-test worry about whether the image-oriented Bicycle-GAN components transfer cleanly to variable-length paths is reasonable and would need checking in the full results. This is incremental but competent applied work in the social navigation area. A reader working on pedestrian prediction or robot planning could extract a usable method if the numbers hold. It is internally consistent and cites the relevant prior literature. I would bring it to a reading group to go through the experiments. It deserves peer review so referees can verify the benchmark results and ablations.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces Social-BiGAT, a graph-based generative adversarial network for multimodal pedestrian trajectory forecasting. It employs a graph attention network (GAT) to encode social interactions from observed positions and velocities, combined with a recurrent encoder-decoder trained adversarially via the Bicycle-GAN reversible latent mapping to generate multiple plausible futures, and claims state-of-the-art results on standard benchmarks relative to prior baselines.

Significance. If the benchmark gains prove robust, the work would show that attention-based social encoding plus a reversible latent-space GAN can improve multimodal prediction quality over earlier social-pooling approaches, with potential value for autonomous navigation and surveillance where both interaction modeling and output diversity matter.

major comments (3)

[Abstract] Abstract: the central claim that the framework 'achieves state-of-the-art performance' is unsupported by any reported metrics (minADE, minFDE, etc.), baseline definitions, dataset names, or quantitative tables, rendering the empirical contribution unverifiable from the provided text.
[Method] Method description: the GAT is constructed solely from past positions and velocities with no scene layout, obstacles, or physical constraints, even though the abstract explicitly flags 'physical interactions with the scene' as a core difficulty; this omission is load-bearing for the claim of improved social modeling.
[Method] Method / Experiments: no evidence is given that the Bicycle-GAN latent regression (originally for image translation) preserves sequence-level multimodality or produces collision-free futures on variable-length pedestrian paths; without ablations or adaptation details the reported gains could be artifacts of evaluation protocol.

minor comments (1)

[Abstract] Abstract: the phrasing 'comparing it to several baselines' is imprecise; the specific baselines, metrics, and number of samples per prediction should be stated.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below, indicating where we agree and plan revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the framework 'achieves state-of-the-art performance' is unsupported by any reported metrics (minADE, minFDE, etc.), baseline definitions, dataset names, or quantitative tables, rendering the empirical contribution unverifiable from the provided text.

Authors: The abstract provides a high-level summary and omits specific numerical results due to typical length constraints. The full manuscript contains quantitative evaluations in the Experiments section, with tables reporting minADE, minFDE, and other metrics on the ETH and UCY datasets against baselines including Social-LSTM and Social-GAN. We will revise the abstract to include a concise reference to the benchmarks and primary metrics. revision: partial
Referee: [Method] Method description: the GAT is constructed solely from past positions and velocities with no scene layout, obstacles, or physical constraints, even though the abstract explicitly flags 'physical interactions with the scene' as a core difficulty; this omission is load-bearing for the claim of improved social modeling.

Authors: The method indeed encodes only trajectory-based social interactions via GAT and does not model scene layout or physical constraints. The abstract introduces physical interactions as a general challenge in the problem domain, while our focus is on social modeling. We agree this distinction should be clearer and will revise the abstract to state explicitly that the work addresses social interactions without incorporating physical scene elements. revision: yes
Referee: [Method] Method / Experiments: no evidence is given that the Bicycle-GAN latent regression (originally for image translation) preserves sequence-level multimodality or produces collision-free futures on variable-length pedestrian paths; without ablations or adaptation details the reported gains could be artifacts of evaluation protocol.

Authors: Section 3.2 describes the adaptation of the Bicycle-GAN reversible mapping to the recurrent encoder-decoder for sequence data to support multimodality. Standard evaluation metrics (best-of-many) and qualitative trajectory visualizations are provided. Explicit ablations on collision rates or sequence-level multimodality preservation are not included. We will expand the discussion of the adaptation details and add relevant qualitative analysis in the revision. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical benchmark validation of GAT+Bicycle-GAN architecture

full rationale

The paper advances an empirical model (GAT encoder on observed trajectories plus Bicycle-GAN reversible latent mapping inside a recurrent encoder-decoder) and supports its central claim solely by reporting minADE/FDE and qualitative results against external baselines on public datasets. No derivation chain, uniqueness theorem, or first-principles prediction is asserted; the multimodal handling is explicitly adopted from the cited Bicycle-GAN work rather than derived internally, and performance numbers are produced by training and evaluation rather than by algebraic reduction to the inputs. The architecture choices are therefore independent of the reported numbers and remain open to falsification on new data.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard domain assumptions about graph attention capturing social dynamics and the applicability of Bicycle-GAN to trajectory distributions; no new entities are postulated and free parameters are the usual deep-learning hyperparameters.

free parameters (1)

model hyperparameters
Typical deep network training choices including learning rates, layer sizes, and attention heads that are tuned on validation data.

axioms (2)

domain assumption Graph attention networks learn reliable feature representations that encode social interactions between humans.
Invoked when describing the GAT component that processes pedestrian features.
domain assumption Bicycle-GAN reversible transformation between scene and latent noise vector models the multimodal nature of future trajectories.
Used to justify the generative component for producing diverse predictions.

pith-pipeline@v0.9.0 · 5758 in / 1322 out tokens · 38824 ms · 2026-05-25T08:54:24.098650+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 10 internal anchors

[1]

Bagautdinov, Alexandre Alahi, François Fleuret, Pascal Fua, and Silvio Savarese

Timur M. Bagautdinov, Alexandre Alahi, François Fleuret, Pascal Fua, and Silvio Savarese. Social scene understanding: End-to-end multi-person action localization and collective activity recognition. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages 3425–3434, 2017

work page 2017
[2]

Wei-Chiu Ma, De-An Huang, Namhoon Lee, and Kris M. Kitani. Forecasting interactive dynamics of pedestrians with ﬁctitious play. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4636–4644, 2017

work page 2017
[3]

Autonomous exploration, active learning and human guidance with open-source poppy humanoid robot platform and explauto library

Sébastien Forestier, Yoan Mollard, Damien Caselli, and Pierre-Yves Oudeyer. Autonomous exploration, active learning and human guidance with open-source poppy humanoid robot platform and explauto library. In The Thirtieth Annual Conference on Neural Information Processing Systems (NIPS 2016) , 2016

work page 2016
[4]

An assistive household robot–doing more than just cleaning

Julia Kantorovitch, Janne Väre, Vesa Pehkonen, Arto Laikari, and Heikki Seppälä. An assistive household robot–doing more than just cleaning. Journal of Assistive Technologies, 8(2):64–76, 2014

work page 2014
[5]

A survey of vision-based trajectory learning and analysis for surveillance

Brendan Tran Morris and Mohan Manubhai Trivedi. A survey of vision-based trajectory learning and analysis for surveillance. IEEE transactions on circuits and systems for video technology, 18(8):1114–1127, 2008

work page 2008
[6]

A large-scale benchmark dataset for event recognition in surveillance video

Sangmin Oh, Anthony Hoogs, Amitha Perera, Naresh Cuntoor, Chia-Chih Chen, Jong Taek Lee, Saurajit Mukherjee, JK Aggarwal, Hyungtae Lee, Larry Davis, et al. A large-scale benchmark dataset for event recognition in surveillance video. In Computer vision and pattern recognition (CVPR), 2011 IEEE conference on, pages 3153–3160. IEEE, 2011

work page 2011
[7]

Video surveillance and counterterror- ism: the application of suspicious activity recognition in visual surveillance systems to counterterrorism

Nick Mould, James L Regens, Carl J Jensen III, and David N Edger. Video surveillance and counterterror- ism: the application of suspicious activity recognition in visual surveillance systems to counterterrorism. Journal of Policing, Intelligence and Counter Terrorism, 9(2):151–175, 2014

work page 2014
[8]

Real-world anomaly detection in surveillance videos

Waqas Sultani, Chen Chen, and Mubarak Shah. Real-world anomaly detection in surveillance videos. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 6479–6488, 2018

work page 2018
[9]

seeing is believing

Irtiza Hasan, Francesco Setti, Theodore Tsesmelis, Alessio Del Bue, Marco Cristani, and Fabio Galasso. "seeing is believing": Pedestrian trajectory forecasting using visual frustum of attention. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1178–1185, 2018

work page 2018
[10]

Social lstm: Human trajectory prediction in crowded spaces

Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 961–971, 2016

work page 2016
[11]

Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks

Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, and Alexandre Alahi. Social gan: Socially acceptable trajectories with generative adversarial networks. arXiv preprint arXiv:1803.10892, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[12]

SoPhie: An attentive GAN for predicting paths compliant to social and physical constraints

Amir Sadeghian, Vineet Kosaraju, Ali Sadeghian, Noriaki Hirose, Hamid Rezatoﬁghi, and Silvio Savarese. SoPhie: An attentive GAN for predicting paths compliant to social and physical constraints. In CVPR, 2019

work page 2019
[13]

Show, attend and tell: Neural image caption generation with visual attention

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning , pages 2048–2057, 2015

work page 2048
[14]

Knowl- edge transfer for scene-speciﬁc motion prediction

Lamberto Ballan, Francesco Castaldo, Alexandre Alahi, Francesco Palmieri, and Silvio Savarese. Knowl- edge transfer for scene-speciﬁc motion prediction. In European Conference on Computer Vision, pages 697–713. Springer, 2016

work page 2016
[15]

Desire: Distant future prediction in dynamic scenes with interacting agents

Namhoon Lee, Wongun Choi, Paul Vernaza, Christopher B Choy, Philip HS Torr, and Manmohan Chan- draker. Desire: Distant future prediction in dynamic scenes with interacting agents. 2017

work page 2017
[16]

CAR-Net: Clairvoyant Attentive Recurrent Network

Amir Sadeghian, Ferdinand Legros, Maxime V oisin, Ricky Vesel, Alexandre Alahi, and Silvio Savarese. Car-net: Clairvoyant attentive recurrent network. arXiv preprint arXiv:1711.10061, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[17]

Learning social etiquette: Human trajectory understanding in crowded scenes

Alexandre Robicquet, Amir Sadeghian, Alexandre Alahi, and Silvio Savarese. Learning social etiquette: Human trajectory understanding in crowded scenes. In European conference on computer vision, pages 549–565. Springer, 2016

work page 2016
[18]

Activity forecasting

Kris M Kitani, Brian D Ziebart, James Andrew Bagnell, and Martial Hebert. Activity forecasting. In European Conference on Computer Vision, pages 201–214. Springer, 2012

work page 2012
[19]

Social force model for pedestrian dynamics

Dirk Helbing and Peter Molnar. Social force model for pedestrian dynamics. Physical review E, 51(5): 4282, 1995

work page 1995
[20]

Improving data association by joint modeling of pedestrian trajectories and groupings

Stefano Pellegrini, Andreas Ess, and Luc Van Gool. Improving data association by joint modeling of pedestrian trajectories and groupings. In European conference on computer vision , pages 452–465. Springer, 2010

work page 2010
[21]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014

work page 2014
[22]

Graph Attention Networks

Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Alejandro Romero, Pietro Lió, and Yoshua Bengio. Graph attention networks. CoRR, abs/1710.10903, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[23]

Zhang, Deepak Pathak, Trevor Darrell, Alexei A

Jun-Yan Zhu, Richard Y . Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, and Eli Shechtman. Toward multimodal image-to-image translation. In NIPS, 2017. 9

work page 2017
[24]

Socially-aware large-scale crowd forecasting

Alexandre Alahi, Vignesh Ramanathan, and Li Fei-Fei. Socially-aware large-scale crowd forecasting. In 2014 IEEE Conference on Computer Vision and Pattern Recognition , number EPFL-CONF-230284, pages 2211–2218. IEEE, 2014

work page 2014
[25]

You’ll never walk alone: Modeling social behavior for multi-target tracking

Stefano Pellegrini, Andreas Ess, Konrad Schindler, and Luc Van Gool. You’ll never walk alone: Modeling social behavior for multi-target tracking. In Computer Vision, 2009 IEEE 12th International Conference on, pages 261–268. IEEE, 2009

work page 2009
[26]

Abnormal crowd behavior detection using social force model

Ramin Mehran, Alexis Oyama, and Mubarak Shah. Abnormal crowd behavior detection using social force model. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on , pages 935–942. IEEE, 2009

work page 2009
[27]

Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection

Tharindu Fernando, Simon Denman, Sridha Sridharan, and Clinton Fookes. Soft+ hardwired atten- tion: An lstm framework for human trajectory prediction and abnormal event detection. arXiv preprint arXiv:1702.05552, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[28]

Tree Memory Networks for Modelling Long-term Temporal Dependencies

Tharindu Fernando, Simon Denman, Aaron McFadyen, Sridha Sridharan, and Clinton Fookes. Tree memory networks for modelling long-term temporal dependencies. arXiv preprint arXiv:1703.04706 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[29]

Context-Aware Trajectory Prediction

Federico Bartoli, Giuseppe Lisanti, Lamberto Ballan, and Alberto Del Bimbo. Context-aware trajectory prediction. arXiv preprint arXiv:1705.02503, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[30]

Particle-based pedestrian path prediction using LSTM-MDL models

Ronny Hug, Stefan Becker, Wolfgang Hübner, and Michael Arens. Particle-based pedestrian path prediction using lstm-mdl models. arXiv preprint arXiv:1804.05546, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[31]

Social Ways: Learning Multi-Modal Distributions of Pedestrian Trajectories with GANs

Javad Amirian, Jean-Bernard Hayet, and Julien Pettré. Social ways: Learning multi-modal distributions of pedestrian trajectories with gans. CoRR, abs/1904.09507, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[32]

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N. Kipf and Max Welling. Semi-supervised classiﬁcation with graph convolutional networks. CoRR, abs/1609.02907, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[33]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. Image-to-image translation with conditional adversarial networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages 5967–5976, 2017

work page 2017
[34]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. 2017 IEEE International Conference on Computer Vision (ICCV) , pages 2242–2251, 2017

work page 2017
[35]

Infogan: Interpretable representation learning by information maximizing generative adversarial nets

Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NIPS, 2016

work page 2016
[36]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NIPS, 2017

work page 2017
[37]

Conditional Generative Adversarial Nets

Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. CoRR, abs/1411.1784, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[38]

Crowds by example

Alon Lerner, Yiorgos Chrysanthou, and Dani Lischinski. Crowds by example. In Computer Graphics F orum, volume 26, pages 655–664. Wiley Online Library, 2007. 10

work page 2007

[1] [1]

Bagautdinov, Alexandre Alahi, François Fleuret, Pascal Fua, and Silvio Savarese

Timur M. Bagautdinov, Alexandre Alahi, François Fleuret, Pascal Fua, and Silvio Savarese. Social scene understanding: End-to-end multi-person action localization and collective activity recognition. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages 3425–3434, 2017

work page 2017

[2] [2]

Wei-Chiu Ma, De-An Huang, Namhoon Lee, and Kris M. Kitani. Forecasting interactive dynamics of pedestrians with ﬁctitious play. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4636–4644, 2017

work page 2017

[3] [3]

Autonomous exploration, active learning and human guidance with open-source poppy humanoid robot platform and explauto library

Sébastien Forestier, Yoan Mollard, Damien Caselli, and Pierre-Yves Oudeyer. Autonomous exploration, active learning and human guidance with open-source poppy humanoid robot platform and explauto library. In The Thirtieth Annual Conference on Neural Information Processing Systems (NIPS 2016) , 2016

work page 2016

[4] [4]

An assistive household robot–doing more than just cleaning

Julia Kantorovitch, Janne Väre, Vesa Pehkonen, Arto Laikari, and Heikki Seppälä. An assistive household robot–doing more than just cleaning. Journal of Assistive Technologies, 8(2):64–76, 2014

work page 2014

[5] [5]

A survey of vision-based trajectory learning and analysis for surveillance

Brendan Tran Morris and Mohan Manubhai Trivedi. A survey of vision-based trajectory learning and analysis for surveillance. IEEE transactions on circuits and systems for video technology, 18(8):1114–1127, 2008

work page 2008

[6] [6]

A large-scale benchmark dataset for event recognition in surveillance video

Sangmin Oh, Anthony Hoogs, Amitha Perera, Naresh Cuntoor, Chia-Chih Chen, Jong Taek Lee, Saurajit Mukherjee, JK Aggarwal, Hyungtae Lee, Larry Davis, et al. A large-scale benchmark dataset for event recognition in surveillance video. In Computer vision and pattern recognition (CVPR), 2011 IEEE conference on, pages 3153–3160. IEEE, 2011

work page 2011

[7] [7]

Video surveillance and counterterror- ism: the application of suspicious activity recognition in visual surveillance systems to counterterrorism

Nick Mould, James L Regens, Carl J Jensen III, and David N Edger. Video surveillance and counterterror- ism: the application of suspicious activity recognition in visual surveillance systems to counterterrorism. Journal of Policing, Intelligence and Counter Terrorism, 9(2):151–175, 2014

work page 2014

[8] [8]

Real-world anomaly detection in surveillance videos

Waqas Sultani, Chen Chen, and Mubarak Shah. Real-world anomaly detection in surveillance videos. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 6479–6488, 2018

work page 2018

[9] [9]

seeing is believing

Irtiza Hasan, Francesco Setti, Theodore Tsesmelis, Alessio Del Bue, Marco Cristani, and Fabio Galasso. "seeing is believing": Pedestrian trajectory forecasting using visual frustum of attention. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1178–1185, 2018

work page 2018

[10] [10]

Social lstm: Human trajectory prediction in crowded spaces

Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 961–971, 2016

work page 2016

[11] [11]

Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks

Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, and Alexandre Alahi. Social gan: Socially acceptable trajectories with generative adversarial networks. arXiv preprint arXiv:1803.10892, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[12] [12]

SoPhie: An attentive GAN for predicting paths compliant to social and physical constraints

Amir Sadeghian, Vineet Kosaraju, Ali Sadeghian, Noriaki Hirose, Hamid Rezatoﬁghi, and Silvio Savarese. SoPhie: An attentive GAN for predicting paths compliant to social and physical constraints. In CVPR, 2019

work page 2019

[13] [13]

Show, attend and tell: Neural image caption generation with visual attention

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning , pages 2048–2057, 2015

work page 2048

[14] [14]

Knowl- edge transfer for scene-speciﬁc motion prediction

Lamberto Ballan, Francesco Castaldo, Alexandre Alahi, Francesco Palmieri, and Silvio Savarese. Knowl- edge transfer for scene-speciﬁc motion prediction. In European Conference on Computer Vision, pages 697–713. Springer, 2016

work page 2016

[15] [15]

Desire: Distant future prediction in dynamic scenes with interacting agents

Namhoon Lee, Wongun Choi, Paul Vernaza, Christopher B Choy, Philip HS Torr, and Manmohan Chan- draker. Desire: Distant future prediction in dynamic scenes with interacting agents. 2017

work page 2017

[16] [16]

CAR-Net: Clairvoyant Attentive Recurrent Network

Amir Sadeghian, Ferdinand Legros, Maxime V oisin, Ricky Vesel, Alexandre Alahi, and Silvio Savarese. Car-net: Clairvoyant attentive recurrent network. arXiv preprint arXiv:1711.10061, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[17] [17]

Learning social etiquette: Human trajectory understanding in crowded scenes

Alexandre Robicquet, Amir Sadeghian, Alexandre Alahi, and Silvio Savarese. Learning social etiquette: Human trajectory understanding in crowded scenes. In European conference on computer vision, pages 549–565. Springer, 2016

work page 2016

[18] [18]

Activity forecasting

Kris M Kitani, Brian D Ziebart, James Andrew Bagnell, and Martial Hebert. Activity forecasting. In European Conference on Computer Vision, pages 201–214. Springer, 2012

work page 2012

[19] [19]

Social force model for pedestrian dynamics

Dirk Helbing and Peter Molnar. Social force model for pedestrian dynamics. Physical review E, 51(5): 4282, 1995

work page 1995

[20] [20]

Improving data association by joint modeling of pedestrian trajectories and groupings

Stefano Pellegrini, Andreas Ess, and Luc Van Gool. Improving data association by joint modeling of pedestrian trajectories and groupings. In European conference on computer vision , pages 452–465. Springer, 2010

work page 2010

[21] [21]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014

work page 2014

[22] [22]

Graph Attention Networks

Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Alejandro Romero, Pietro Lió, and Yoshua Bengio. Graph attention networks. CoRR, abs/1710.10903, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[23] [23]

Zhang, Deepak Pathak, Trevor Darrell, Alexei A

Jun-Yan Zhu, Richard Y . Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, and Eli Shechtman. Toward multimodal image-to-image translation. In NIPS, 2017. 9

work page 2017

[24] [24]

Socially-aware large-scale crowd forecasting

Alexandre Alahi, Vignesh Ramanathan, and Li Fei-Fei. Socially-aware large-scale crowd forecasting. In 2014 IEEE Conference on Computer Vision and Pattern Recognition , number EPFL-CONF-230284, pages 2211–2218. IEEE, 2014

work page 2014

[25] [25]

You’ll never walk alone: Modeling social behavior for multi-target tracking

Stefano Pellegrini, Andreas Ess, Konrad Schindler, and Luc Van Gool. You’ll never walk alone: Modeling social behavior for multi-target tracking. In Computer Vision, 2009 IEEE 12th International Conference on, pages 261–268. IEEE, 2009

work page 2009

[26] [26]

Abnormal crowd behavior detection using social force model

Ramin Mehran, Alexis Oyama, and Mubarak Shah. Abnormal crowd behavior detection using social force model. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on , pages 935–942. IEEE, 2009

work page 2009

[27] [27]

Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection

Tharindu Fernando, Simon Denman, Sridha Sridharan, and Clinton Fookes. Soft+ hardwired atten- tion: An lstm framework for human trajectory prediction and abnormal event detection. arXiv preprint arXiv:1702.05552, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[28] [28]

Tree Memory Networks for Modelling Long-term Temporal Dependencies

Tharindu Fernando, Simon Denman, Aaron McFadyen, Sridha Sridharan, and Clinton Fookes. Tree memory networks for modelling long-term temporal dependencies. arXiv preprint arXiv:1703.04706 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[29] [29]

Context-Aware Trajectory Prediction

Federico Bartoli, Giuseppe Lisanti, Lamberto Ballan, and Alberto Del Bimbo. Context-aware trajectory prediction. arXiv preprint arXiv:1705.02503, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[30] [30]

Particle-based pedestrian path prediction using LSTM-MDL models

Ronny Hug, Stefan Becker, Wolfgang Hübner, and Michael Arens. Particle-based pedestrian path prediction using lstm-mdl models. arXiv preprint arXiv:1804.05546, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[31] [31]

Social Ways: Learning Multi-Modal Distributions of Pedestrian Trajectories with GANs

Javad Amirian, Jean-Bernard Hayet, and Julien Pettré. Social ways: Learning multi-modal distributions of pedestrian trajectories with gans. CoRR, abs/1904.09507, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904

[32] [32]

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N. Kipf and Max Welling. Semi-supervised classiﬁcation with graph convolutional networks. CoRR, abs/1609.02907, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[33] [33]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. Image-to-image translation with conditional adversarial networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages 5967–5976, 2017

work page 2017

[34] [34]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. 2017 IEEE International Conference on Computer Vision (ICCV) , pages 2242–2251, 2017

work page 2017

[35] [35]

Infogan: Interpretable representation learning by information maximizing generative adversarial nets

Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NIPS, 2016

work page 2016

[36] [36]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NIPS, 2017

work page 2017

[37] [37]

Conditional Generative Adversarial Nets

Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. CoRR, abs/1411.1784, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[38] [38]

Crowds by example

Alon Lerner, Yiorgos Chrysanthou, and Dani Lischinski. Crowds by example. In Computer Graphics F orum, volume 26, pages 655–664. Wiley Online Library, 2007. 10

work page 2007