Introduces Image-Based Activity Localization task for unseen activities, a self-attention interaction localizer using region self-attention and local transformer, and the ActivityIBAL dataset from ActivityNet.
Attention is all you need
3 Pith papers cite this work. Polarity classification is still indexing.
years
2019 3verdicts
UNVERDICTED 3representative citing papers
Introduces HCSA, a hierarchical convolutional self-attention network for efficient long-form video QA with question-aware dependency modeling.
SBSG model generates sequences bidirectionally from ends to middle via interactive attention, claiming faster decoding and better quality than autoregressive Transformer on NMT and summarization tasks.
citing papers explorer
-
Localizing Unseen Activities in Video via Image Query
Introduces Image-Based Activity Localization task for unseen activities, a self-attention interaction localizer using region self-attention and local transformer, and the ActivityIBAL dataset from ActivityNet.
-
Open-Ended Long-Form Video Question Answering via Hierarchical Convolutional Self-Attention Networks
Introduces HCSA, a hierarchical convolutional self-attention network for efficient long-form video QA with question-aware dependency modeling.
-
Sequence Generation: From Both Sides to the Middle
SBSG model generates sequences bidirectionally from ends to middle via interactive attention, claiming faster decoding and better quality than autoregressive Transformer on NMT and summarization tasks.