Introduces Image-Based Activity Localization task for unseen activities, a self-attention interaction localizer using region self-attention and local transformer, and the ActivityIBAL dataset from ActivityNet.
Learning spa- tiotemporal features with 3d convolutional networks
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2019 2verdicts
UNVERDICTED 2representative citing papers
Introduces HCSA, a hierarchical convolutional self-attention network for efficient long-form video QA with question-aware dependency modeling.
citing papers explorer
-
Localizing Unseen Activities in Video via Image Query
Introduces Image-Based Activity Localization task for unseen activities, a self-attention interaction localizer using region self-attention and local transformer, and the ActivityIBAL dataset from ActivityNet.
-
Open-Ended Long-Form Video Question Answering via Hierarchical Convolutional Self-Attention Networks
Introduces HCSA, a hierarchical convolutional self-attention network for efficient long-form video QA with question-aware dependency modeling.