DiffVC applies diffusion models for non-autoregressive video captioning, outperforming prior non-AR methods and matching AR ones in quality with faster speed on standard benchmarks.
Learning modality interaction for temporal sentence localization and event caption- ing in videos,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DiffVC: A Non-autoregressive Framework Based on Diffusion Model for Video Captioning
DiffVC applies diffusion models for non-autoregressive video captioning, outperforming prior non-AR methods and matching AR ones in quality with faster speed on standard benchmarks.