Motion-Aware Feature for Improved Video Anomaly Detection

Shawn Newsam; Yi Zhu

arxiv: 1907.10211 · v1 · pith:7OPKOKU7new · submitted 2019-07-24 · 💻 cs.CV · cs.LG· eess.IV

Motion-Aware Feature for Improved Video Anomaly Detection

Yi Zhu , Shawn Newsam This is my paper

Pith reviewed 2026-05-24 17:20 UTC · model grok-4.3

classification 💻 cs.CV cs.LGeess.IV

keywords video anomaly detectionmotion-aware featuremultiple instance learningattention mechanismtemporal contextUCF Crime datasetanomalous action recognition

0 comments

The pith

A motion-aware feature from a temporal augmented network, paired with attention in a MIL ranking model, outperforms prior methods on video anomaly detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that motion cues drive effective anomaly detection in video, so it builds a temporal augmented network to learn motion-aware features. These features match earlier state-of-the-art results on their own and raise performance further when fused with existing methods. An attention block is added to the Multiple Instance Learning ranking model to incorporate temporal context and produce weights that separate anomalous from normal segments. The full combination yields large gains on both anomaly detection and anomalous action recognition in the UCF Crime dataset. Readers would care because reliable motion-based detection could support better automated monitoring of security footage.

Core claim

The authors show that a temporal augmented network produces a motion-aware feature which alone reaches competitive accuracy and, when combined with prior approaches, delivers significant gains; adding an attention block to the temporal Multiple Instance Learning ranking model further improves differentiation of anomalous versus normal segments, resulting in large-margin outperformance on anomaly detection and action recognition tasks within the UCF Crime dataset.

What carries the argument

The temporal augmented network that extracts the motion-aware feature, together with the attention block inside the temporal MIL ranking model that learns segment weights from temporal context.

If this is right

The motion-aware feature by itself matches previous state-of-the-art accuracy.
Combining the motion-aware feature with existing methods produces significant further gains.
The attention weights improve separation between anomalous and normal video segments.
The combined system achieves large-margin gains on both anomaly detection and anomalous action recognition in UCF Crime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same motion-plus-attention structure could be tested on datasets with different anomaly types to check whether motion remains the dominant cue.
If attention weights prove stable across domains, the approach might reduce the need for frame-level labels in weakly supervised video tasks.
Real-time deployment would require checking whether the temporal augmented network adds acceptable latency to live video streams.

Load-bearing premise

The attention block will generate weights that reliably separate anomalous from normal segments on video data outside the training distribution.

What would settle it

Apply the trained model to a new video anomaly dataset whose anomalies are driven by appearance rather than motion or lack clear temporal localization, and measure whether the reported performance margin disappears.

Figures

Figures reproduced from arXiv: 1907.10211 by Shawn Newsam, Yi Zhu.

**Figure 2.** Figure 2: Overall framework. We first obtain the motion-aware feature and then compute the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Visual examples of prediction results. For the anomalous frames, our model is able [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Motivated by our observation that motion information is the key to good anomaly detection performance in video, we propose a temporal augmented network to learn a motion-aware feature. This feature alone can achieve competitive performance with previous state-of-the-art methods, and when combined with them, can achieve significant performance improvements. Furthermore, we incorporate temporal context into the Multiple Instance Learning (MIL) ranking model by using an attention block. The learned attention weights can help to differentiate between anomalous and normal video segments better. With the proposed motion-aware feature and the temporal MIL ranking model, we outperform previous approaches by a large margin on both anomaly detection and anomalous action recognition tasks in the UCF Crime dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a temporal network for motion features and attention inside MIL ranking, but the large-margin claims rest on experiments not shown in the abstract.

read the letter

The core contribution here is a temporal augmented network that produces motion-aware features for video anomaly detection, plus an attention block added to the standard MIL ranking model. The motion feature by itself is presented as competitive with prior work, and combining it with existing methods is said to yield significant gains. The attention is meant to help the MIL model focus on anomalous segments over normal ones using temporal context. This is a concrete design tweak rather than a wholesale new framework. It builds directly on the observation that motion drives good performance in this task, which aligns with common sense in surveillance video analysis. The UCF Crime dataset is the main testbed, covering both anomaly detection and anomalous action recognition. The paper does a reasonable job laying out the motivation and describing how the pieces fit together without overclaiming theoretical novelty. The attention mechanism inside MIL is a straightforward way to incorporate temporal information that prior MIL setups for this problem often lacked. That said, the abstract asserts large-margin outperformance and significant improvements without any numbers, baselines, ablation results, or error breakdowns. All the performance claims are unverified assertions at this stage, which makes it difficult to assess whether the gains come from the motion feature, the attention, or something else. The stress-test point about the attention weights failing to generalize outside the UCF Crime training distribution is worth checking in the full paper, since these models are trained end-to-end and anomaly distributions in deployment often shift. The free parameters in the temporal network and attention weights are standard for this kind of learned pipeline, so no unusual circularity there. This work is aimed at researchers doing practical video anomaly detection in computer vision, especially those already using MIL ranking on datasets like UCF Crime. A reader focused on incremental engineering improvements to feature extraction and ranking models could extract some value if the experiments hold up under scrutiny. It is coherent on its own terms and shows honest engagement with the literature, so it deserves a serious referee to examine the full results and ablations rather than a desk reject.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a motion-aware feature extracted using a temporal augmented network for video anomaly detection. This feature is shown to be competitive with prior state-of-the-art methods on its own and to provide significant improvements when integrated with existing approaches. The authors further enhance the Multiple Instance Learning (MIL) ranking model by incorporating an attention block to leverage temporal context, claiming that the learned attention weights better distinguish anomalous from normal video segments. Using this combination, the paper reports outperforming previous methods by a large margin on both anomaly detection and anomalous action recognition tasks on the UCF Crime dataset.

Significance. If the empirical claims hold, this work would be moderately significant for the video anomaly detection community by highlighting the value of explicit motion modeling and temporal attention mechanisms. It could encourage more research into hybrid feature and model enhancements on challenging datasets like UCF Crime. The approach is practical and builds directly on existing MIL frameworks.

major comments (2)

[Section describing the temporal MIL model] Section describing the temporal MIL model: The central claim that the attention block produces weights reliably differentiating anomalous from normal segments is load-bearing for the reported large-margin gains, yet the construction is an end-to-end empirical fit on UCF Crime with no provided analysis or test of generalization to shifted anomaly distributions outside the training set.
[Abstract] Abstract: The headline claim of outperforming previous approaches by a large margin on UCF Crime rests on an assertion of performance gains, but the abstract supplies no quantitative numbers, specific baselines, ablation tables, or error analysis to allow verification of the magnitude or attribution of improvements to the motion-aware feature versus the attention component.

minor comments (2)

The abstract would be strengthened by including at least one or two concrete performance metrics (e.g., AUC values) to support the qualitative statements of improvement.
Clarify the exact definition and computation of the motion-aware feature with an equation or pseudocode in the method section to make the temporal augmented network reproducible from the text alone.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the detailed review and constructive suggestions. We address each major comment below, indicating where revisions will be made to the manuscript.

read point-by-point responses

Referee: Abstract: The headline claim of outperforming previous approaches by a large margin on UCF Crime rests on an assertion of performance gains, but the abstract supplies no quantitative numbers, specific baselines, ablation tables, or error analysis to allow verification of the magnitude or attribution of improvements to the motion-aware feature versus the attention component.

Authors: We agree that the abstract would benefit from explicit quantitative support. In the revised version we will include the key AUC numbers on UCF Crime (both for anomaly detection and action recognition), name the primary baselines, and briefly attribute the gains to the motion-aware feature and the attention-augmented MIL model. revision: yes
Referee: Section describing the temporal MIL model: The central claim that the attention block produces weights reliably differentiating anomalous from normal segments is load-bearing for the reported large-margin gains, yet the construction is an end-to-end empirical fit on UCF Crime with no provided analysis or test of generalization to shifted anomaly distributions outside the training set.

Authors: The attention block is trained end-to-end within the MIL ranking loss on UCF Crime; its utility is demonstrated by the consistent performance lift when it is added. We can strengthen the manuscript by adding attention-weight visualizations on representative normal and anomalous videos to illustrate the differentiation. However, systematic evaluation on deliberately shifted anomaly distributions would require new datasets and experiments that are outside the current scope. revision: partial

standing simulated objections not resolved

Explicit generalization tests of the learned attention weights to anomaly distributions that differ substantially from those in UCF Crime

Circularity Check

0 steps flagged

No circularity: empirical pipeline with external benchmarks

full rationale

The paper presents an empirical method consisting of a motion-aware feature extractor and a temporal MIL ranking model with attention, trained and evaluated on the UCF Crime dataset. No equations, derivations, or self-citations reduce the reported performance gains to quantities defined by the authors' own fitted constants or prior self-referential claims. The central claims rest on experimental comparisons against prior methods on held-out data, which constitutes independent evidence rather than a self-referential loop. This is the expected outcome for a standard computer-vision pipeline paper.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The central claim rests on the learned parameters of the temporal network and attention block, the domain assumption that motion is the dominant cue, and the empirical claim that the UCF Crime dataset is a sufficient testbed; no new physical entities are postulated.

free parameters (2)

temporal network weights
All parameters of the temporal augmented network are fitted to training data.
attention weights
The attention block parameters are learned during MIL training.

axioms (1)

domain assumption Motion information is the key to good anomaly detection performance in video.
Explicitly stated as the motivating observation in the abstract.

invented entities (1)

motion-aware feature no independent evidence
purpose: A learned representation intended to capture motion cues for anomaly detection.
Introduced as the output of the temporal augmented network; no independent evidence outside the training process is provided.

pith-pipeline@v0.9.0 · 5635 in / 1300 out tokens · 24500 ms · 2026-05-24T17:20:56.584833+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

[1]

Flow Fields: Dense Correspon- dence Fields for Highly Accurate Large Displacement Optical Flow Estimation

Christian Bailer, Bertram Taetz, and Didier Stricker. Flow Fields: Dense Correspon- dence Fields for Highly Accurate Large Displacement Optical Flow Estimation. In International Conference on Computer Vision (ICCV), 2015

work page 2015
[2]

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Joao Carreira and Andrew Zisserman. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

work page 2017
[3]

Video Anomaly Detection and Localization Using Hierarchical Feature Representation and Gaussian Process Regres- sion

Kai-Wen Cheng, Yie-Tarng Chen, and Wen-Hsien Fang. Video Anomaly Detection and Localization Using Hierarchical Feature Representation and Gaussian Process Regres- sion. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2015

work page 2015
[4]

Roy-Chowdhury, and Larry S

Mahmudul Hasan, Jonghyun Choi, Jan Neumann, Amit K. Roy-Chowdhury, and Larry S. Davis. Learning Temporal Regularity in Video Sequences. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

work page 2016
[5]

Tube Convolutional Neural Network (T- CNN) for Action Detection in Videos

Rui Hou, Chen Chen, and Mubarak Shah. Tube Convolutional Neural Network (T- CNN) for Action Detection in Videos. In The IEEE International Conference on Com- puter Vision (ICCV), 2017

work page 2017
[6]

FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Net- works

Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Net- works. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

work page 2017
[7]

Deepvs: A deep learning based video saliency prediction approach

Lai Jiang, Mai Xu, Tie Liu, Minglang Qiao, and Zulin Wang. Deepvs: A deep learning based video saliency prediction approach. In The European Conference on Computer Vision (ECCV), 2018

work page 2018
[8]

Anomaly Detection and Lo- calization in Crowded Scenes

Weixin Li, Vijay Mahadevan, and Nuno Vasconcelos. Anomaly Detection and Lo- calization in Crowded Scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36(1):18–32, 2014. YI ZHU, SHAWN NEWSAM: 11

work page 2014
[9]

Abnormal Event Detection at 150 FPS in MAT- LAB

Cewu Lu, Jianping Shi, and Jiaya Jia. Abnormal Event Detection at 150 FPS in MAT- LAB. In IEEE International Conference on Computer Vision (ICCV), 2013

work page 2013
[10]

Narasimhan and Sowmya Kamath S

Medhini G. Narasimhan and Sowmya Kamath S. Dynamic Video Anomaly Detec- tion and Localization Using Sparse Denoising Autoencoders. Multimedia Tools and Applications, 77(11):13173–13195, 2018

work page 2018
[11]

Novel Dataset for Fine-Grained Abnormal Behavior Under- standing in Crowd

Hamidreza Rabiee, Javad Haddadnia, Hossein Mousavi, Maziyar Kalantarzadeh, Moin Nabi, and Vittorio Murino. Novel Dataset for Fine-Grained Abnormal Behavior Under- standing in Crowd. In IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2016

work page 2016
[12]

Slicing Convolutional Neural Network for Crowd Video Understanding

Jing Shao, Chen Change Loy, Kai Kang, and Xiaogang Wang. Slicing Convolutional Neural Network for Crowd Video Understanding. In The IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), 2016

work page 2016
[13]

Two-Stream Convolutional Networks for Ac- tion Recognition in Videos

Karen Simonyan and Andrew Zisserman. Two-Stream Convolutional Networks for Ac- tion Recognition in Videos. In Conference on Neural Information Processing Systems (NeurIPS), 2014

work page 2014
[14]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representa- tions (ICLR), 2015

work page 2015
[15]

Real-World Anomaly Detection in Surveillance Videos

Waqas Sultani, Chen Chen, and Mubarak Shah. Real-World Anomaly Detection in Surveillance Videos. In The IEEE Conference on Computer Vision and Pattern Recog- nition (CVPR), 2018

work page 2018
[16]

PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost V olume

Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost V olume. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018

work page 2018
[17]

Going Deeper with Convolutions

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going Deeper with Convolutions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015

work page 2015
[18]

Learning Spatiotemporal Features with 3D Convolutional Networks

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. Learning Spatiotemporal Features with 3D Convolutional Networks. In IEEE Inter- national Conference on Computer Vision (ICCV), 2015

work page 2015
[19]

Long-term Temporal Convolutions for Action Recognition

Gul Varol, Ivan Laptev, and Cordelia Schmid. Long-term Temporal Convolutions for Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2017

work page 2017
[20]

Action Recognition with Improved Trajectories

Heng Wang and Cordelia Schmid. Action Recognition with Improved Trajectories. In IEEE International Conference on Computer Vision (ICCV), 2013

work page 2013
[21]

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. In European Conference on Computer Vision (ECCV), 2016. 12 YI ZHU, SHAWN NEWSAM:

work page 2016
[22]

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classiﬁcation

Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, and Kevin Murphy. Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classiﬁcation. In European Conference on Computer Vision (ECCV), 2018

work page 2018
[23]

Learning Deep Rep- resentations of Appearance and Motion for Anomalous Event Detection

Dan Xu, Elisa Ricci, Yan Yan, Jingkuan Song, and Nicu Sebe. Learning Deep Rep- resentations of Appearance and Motion for Anomalous Event Detection. In British Machine Vision Conference (BMVC), 2015

work page 2015
[24]

Unsupervised Extraction of Video Highlights Via Robust Recurrent Auto-Encoders

Huan Yang, Baoyuan Wang, Stephen Lin, David Wipf, Minyi Guo, and Baining Guo. Unsupervised Extraction of Video Highlights Via Robust Recurrent Auto-Encoders. In IEEE International Conference on Computer Vision (ICCV), 2015

work page 2015
[25]

A Duality Based Approach for Realtime TV-L1 Optical Flow

Christopher Zach, Thomas Pock, and Horst Bischof. A Duality Based Approach for Realtime TV-L1 Optical Flow. In DAGM Conference on Pattern Recognition, 2014

work page 2014
[26]

Large-Scale Visual Relationship Understanding

Ji Zhang, Yannis Kalantidis, Marcus Rohrbach, Manohar Paluri, Ahmed Elgammal, and Mohamed Elhoseiny. Large-Scale Visual Relationship Understanding. In AAAI Conference on Artiﬁcial Intelligence (AAAI), 2019

work page 2019
[27]

Shih, Ahmed Elgammal, Andrew Tao, and Bryan Catanzaro

Ji Zhang, Kevin J. Shih, Ahmed Elgammal, Andrew Tao, and Bryan Catanzaro. Graph- ical Contrastive Losses for Scene Graph Parsing. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019

work page 2019
[28]

Towards Universal Representation for Unseen Action Recognition

Yi Zhu, Yang Long, Yu Guan, Shawn Newsam, and Ling Shao. Towards Universal Representation for Unseen Action Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018

work page 2018

[1] [1]

Flow Fields: Dense Correspon- dence Fields for Highly Accurate Large Displacement Optical Flow Estimation

Christian Bailer, Bertram Taetz, and Didier Stricker. Flow Fields: Dense Correspon- dence Fields for Highly Accurate Large Displacement Optical Flow Estimation. In International Conference on Computer Vision (ICCV), 2015

work page 2015

[2] [2]

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Joao Carreira and Andrew Zisserman. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

work page 2017

[3] [3]

Video Anomaly Detection and Localization Using Hierarchical Feature Representation and Gaussian Process Regres- sion

Kai-Wen Cheng, Yie-Tarng Chen, and Wen-Hsien Fang. Video Anomaly Detection and Localization Using Hierarchical Feature Representation and Gaussian Process Regres- sion. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2015

work page 2015

[4] [4]

Roy-Chowdhury, and Larry S

Mahmudul Hasan, Jonghyun Choi, Jan Neumann, Amit K. Roy-Chowdhury, and Larry S. Davis. Learning Temporal Regularity in Video Sequences. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

work page 2016

[5] [5]

Tube Convolutional Neural Network (T- CNN) for Action Detection in Videos

Rui Hou, Chen Chen, and Mubarak Shah. Tube Convolutional Neural Network (T- CNN) for Action Detection in Videos. In The IEEE International Conference on Com- puter Vision (ICCV), 2017

work page 2017

[6] [6]

FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Net- works

Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Net- works. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

work page 2017

[7] [7]

Deepvs: A deep learning based video saliency prediction approach

Lai Jiang, Mai Xu, Tie Liu, Minglang Qiao, and Zulin Wang. Deepvs: A deep learning based video saliency prediction approach. In The European Conference on Computer Vision (ECCV), 2018

work page 2018

[8] [8]

Anomaly Detection and Lo- calization in Crowded Scenes

Weixin Li, Vijay Mahadevan, and Nuno Vasconcelos. Anomaly Detection and Lo- calization in Crowded Scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36(1):18–32, 2014. YI ZHU, SHAWN NEWSAM: 11

work page 2014

[9] [9]

Abnormal Event Detection at 150 FPS in MAT- LAB

Cewu Lu, Jianping Shi, and Jiaya Jia. Abnormal Event Detection at 150 FPS in MAT- LAB. In IEEE International Conference on Computer Vision (ICCV), 2013

work page 2013

[10] [10]

Narasimhan and Sowmya Kamath S

Medhini G. Narasimhan and Sowmya Kamath S. Dynamic Video Anomaly Detec- tion and Localization Using Sparse Denoising Autoencoders. Multimedia Tools and Applications, 77(11):13173–13195, 2018

work page 2018

[11] [11]

Novel Dataset for Fine-Grained Abnormal Behavior Under- standing in Crowd

Hamidreza Rabiee, Javad Haddadnia, Hossein Mousavi, Maziyar Kalantarzadeh, Moin Nabi, and Vittorio Murino. Novel Dataset for Fine-Grained Abnormal Behavior Under- standing in Crowd. In IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2016

work page 2016

[12] [12]

Slicing Convolutional Neural Network for Crowd Video Understanding

Jing Shao, Chen Change Loy, Kai Kang, and Xiaogang Wang. Slicing Convolutional Neural Network for Crowd Video Understanding. In The IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), 2016

work page 2016

[13] [13]

Two-Stream Convolutional Networks for Ac- tion Recognition in Videos

Karen Simonyan and Andrew Zisserman. Two-Stream Convolutional Networks for Ac- tion Recognition in Videos. In Conference on Neural Information Processing Systems (NeurIPS), 2014

work page 2014

[14] [14]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representa- tions (ICLR), 2015

work page 2015

[15] [15]

Real-World Anomaly Detection in Surveillance Videos

Waqas Sultani, Chen Chen, and Mubarak Shah. Real-World Anomaly Detection in Surveillance Videos. In The IEEE Conference on Computer Vision and Pattern Recog- nition (CVPR), 2018

work page 2018

[16] [16]

PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost V olume

Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost V olume. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018

work page 2018

[17] [17]

Going Deeper with Convolutions

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going Deeper with Convolutions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015

work page 2015

[18] [18]

Learning Spatiotemporal Features with 3D Convolutional Networks

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. Learning Spatiotemporal Features with 3D Convolutional Networks. In IEEE Inter- national Conference on Computer Vision (ICCV), 2015

work page 2015

[19] [19]

Long-term Temporal Convolutions for Action Recognition

Gul Varol, Ivan Laptev, and Cordelia Schmid. Long-term Temporal Convolutions for Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2017

work page 2017

[20] [20]

Action Recognition with Improved Trajectories

Heng Wang and Cordelia Schmid. Action Recognition with Improved Trajectories. In IEEE International Conference on Computer Vision (ICCV), 2013

work page 2013

[21] [21]

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. In European Conference on Computer Vision (ECCV), 2016. 12 YI ZHU, SHAWN NEWSAM:

work page 2016

[22] [22]

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classiﬁcation

Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, and Kevin Murphy. Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classiﬁcation. In European Conference on Computer Vision (ECCV), 2018

work page 2018

[23] [23]

Learning Deep Rep- resentations of Appearance and Motion for Anomalous Event Detection

Dan Xu, Elisa Ricci, Yan Yan, Jingkuan Song, and Nicu Sebe. Learning Deep Rep- resentations of Appearance and Motion for Anomalous Event Detection. In British Machine Vision Conference (BMVC), 2015

work page 2015

[24] [24]

Unsupervised Extraction of Video Highlights Via Robust Recurrent Auto-Encoders

Huan Yang, Baoyuan Wang, Stephen Lin, David Wipf, Minyi Guo, and Baining Guo. Unsupervised Extraction of Video Highlights Via Robust Recurrent Auto-Encoders. In IEEE International Conference on Computer Vision (ICCV), 2015

work page 2015

[25] [25]

A Duality Based Approach for Realtime TV-L1 Optical Flow

Christopher Zach, Thomas Pock, and Horst Bischof. A Duality Based Approach for Realtime TV-L1 Optical Flow. In DAGM Conference on Pattern Recognition, 2014

work page 2014

[26] [26]

Large-Scale Visual Relationship Understanding

Ji Zhang, Yannis Kalantidis, Marcus Rohrbach, Manohar Paluri, Ahmed Elgammal, and Mohamed Elhoseiny. Large-Scale Visual Relationship Understanding. In AAAI Conference on Artiﬁcial Intelligence (AAAI), 2019

work page 2019

[27] [27]

Shih, Ahmed Elgammal, Andrew Tao, and Bryan Catanzaro

Ji Zhang, Kevin J. Shih, Ahmed Elgammal, Andrew Tao, and Bryan Catanzaro. Graph- ical Contrastive Losses for Scene Graph Parsing. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019

work page 2019

[28] [28]

Towards Universal Representation for Unseen Action Recognition

Yi Zhu, Yang Long, Yu Guan, Shawn Newsam, and Ling Shao. Towards Universal Representation for Unseen Action Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018

work page 2018