DiffVC applies diffusion models for non-autoregressive video captioning, outperforming prior non-AR methods and matching AR ones in quality with faster speed on standard benchmarks.
M3: Multimodal memory modelling for video captioning,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DiffVC: A Non-autoregressive Framework Based on Diffusion Model for Video Captioning
DiffVC applies diffusion models for non-autoregressive video captioning, outperforming prior non-AR methods and matching AR ones in quality with faster speed on standard benchmarks.