Localizing Unseen Activities in Video via Image Query

arxiv: 1906.12165 · v1 · pith:JTLF7JO7new · submitted 2019-06-28 · 💻 cs.CV · cs.IR

Localizing Unseen Activities in Video via Image Query

Zhu Zhang , Zhou Zhao , Zhijie Lin , Jingkuan Song , Deng Cai This is my paper

classification 💻 cs.CV cs.IR

keywords imageactivitieslocalizationunseenqueriesvideoactioncontents

0 comments p. Extension

pith:JTLF7JO7 Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{JTLF7JO7}

Prints a linked pith:JTLF7JO7 badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

Action localization in untrimmed videos is an important topic in the field of video understanding. However, existing action localization methods are restricted to a pre-defined set of actions and cannot localize unseen activities. Thus, we consider a new task to localize unseen activities in videos via image queries, named Image-Based Activity Localization. This task faces three inherent challenges: (1) how to eliminate the influence of semantically inessential contents in image queries; (2) how to deal with the fuzzy localization of inaccurate image queries; (3) how to determine the precise boundaries of target segments. We then propose a novel self-attention interaction localizer to retrieve unseen activities in an end-to-end fashion. Specifically, we first devise a region self-attention method with relative position encoding to learn fine-grained image region representations. Then, we employ a local transformer encoder to build multi-step fusion and reasoning of image and video contents. We next adopt an order-sensitive localizer to directly retrieve the target segment. Furthermore, we construct a new dataset ActivityIBAL by reorganizing the ActivityNet dataset. The extensive experiments show the effectiveness of our method.

This paper has not been read by Pith yet.

Localizing Unseen Activities in Video via Image Query

discussion (0)