SD-MVSum extends script-driven video summarization to multimodal inputs by modeling script-video and script-transcript relevance with a new weighted cross-modal attention mechanism, plus extended S-VideoXum and MrHiSum datasets.
Silero vad: pre-trained enterprise-grade voice activity detector (vad), number detector and language clas- sifier.https://github.com/snakers4/silero- vad, 2024
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
SD-MVSum: Script-Driven Multimodal Video Summarization Method and Datasets
SD-MVSum extends script-driven video summarization to multimodal inputs by modeling script-video and script-transcript relevance with a new weighted cross-modal attention mechanism, plus extended S-VideoXum and MrHiSum datasets.