A temporal pooling layer added to LLaVA smooths video feature distributions and lifts performance on dense video captioning and QA to new SOTA levels without extra parameters.
Improved baselines with visual instruction tuning
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2024 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
A temporal pooling layer added to LLaVA smooths video feature distributions and lifts performance on dense video captioning and QA to new SOTA levels without extra parameters.