pith. sign in

arxiv: 1907.05092 · v1 · pith:N7PYMJ6Nnew · submitted 2019-07-11 · 💻 cs.CV · cs.CL· cs.LG

Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos

classification 💻 cs.CV cs.CLcs.LG
keywords captioningeventsvideocontextsmodelstaskchallengedense
0
0 comments X
read the original abstract

Contextual reasoning is essential to understand events in long untrimmed videos. In this work, we systematically explore different captioning models with various contexts for the dense-captioning events in video task, which aims to generate captions for different events in the untrimmed video. We propose five types of contexts as well as two categories of event captioning models, and evaluate their contributions for event captioning from both accuracy and diversity aspects. The proposed captioning models are plugged into our pipeline system for the dense video captioning challenge. The overall system achieves the state-of-the-art performance on the dense-captioning events in video task with 9.91 METEOR score on the challenge testing set.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. HumanVBench: Probing Human-Centric Video Understanding in MLLMs with Automatically Synthesized Benchmarks

    cs.CV 2024-12 unverdicted novelty 7.0

    HumanVBench provides a 16-task benchmark for human-centric video understanding in MLLMs, created through automated annotation and distractor synthesis pipelines, and shows top models lag human performance on emotion p...