SD-MVSum extends script-driven video summarization to multimodal inputs by modeling script-video and script-transcript relevance with a new weighted cross-modal attention mechanism, plus extended S-VideoXum and MrHiSum datasets.
CSTA: CNN- based Spatiotemporal Attention for Video Summarization
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
SD-MVSum: Script-Driven Multimodal Video Summarization Method and Datasets
SD-MVSum extends script-driven video summarization to multimodal inputs by modeling script-video and script-transcript relevance with a new weighted cross-modal attention mechanism, plus extended S-VideoXum and MrHiSum datasets.