Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos

Alexander Hauptmann; Bei Liu; Jianlong Fu; Qin Jin; Shizhe Chen; Yida Zhao; Yuqing Song; Zhaoyang Zeng

arxiv: 1907.05092 · v1 · pith:N7PYMJ6Nnew · submitted 2019-07-11 · 💻 cs.CV · cs.CL· cs.LG

Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos

Shizhe Chen , Yuqing Song , Yida Zhao , Qin Jin , Zhaoyang Zeng , Bei Liu , Jianlong Fu , Alexander Hauptmann This is my paper

classification 💻 cs.CV cs.CLcs.LG

keywords captioningeventsvideocontextsmodelstaskchallengedense

0 comments

read the original abstract

Contextual reasoning is essential to understand events in long untrimmed videos. In this work, we systematically explore different captioning models with various contexts for the dense-captioning events in video task, which aims to generate captions for different events in the untrimmed video. We propose five types of contexts as well as two categories of event captioning models, and evaluate their contributions for event captioning from both accuracy and diversity aspects. The proposed captioning models are plugged into our pipeline system for the dense video captioning challenge. The overall system achieves the state-of-the-art performance on the dense-captioning events in video task with 9.91 METEOR score on the challenge testing set.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

HumanVBench: Probing Human-Centric Video Understanding in MLLMs with Automatically Synthesized Benchmarks
cs.CV 2024-12 unverdicted novelty 7.0

HumanVBench provides a 16-task benchmark for human-centric video understanding in MLLMs, created through automated annotation and distractor synthesis pipelines, and shows top models lag human performance on emotion p...