A two-stream spatiotemporal feature extractor with squeeze-and-excitation and attention-based context matching improves text-only video QA on TVQA but shows limitations with visual features.
Uncovering the temporal context for video question answering,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2019 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Two-stream Spatiotemporal Feature for Video QA Task
A two-stream spatiotemporal feature extractor with squeeze-and-excitation and attention-based context matching improves text-only video QA on TVQA but shows limitations with visual features.