Video-LLaMA: An instruction-tuned audio-visual language model for video understanding

Hang Zhang, Xin Li, Lidong Bing · 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

cs.CV · 2024-04-25 · conditional · novelty 5.0

A temporal pooling layer added to LLaVA smooths video feature distributions and lifts performance on dense video captioning and QA to new SOTA levels without extra parameters.

citing papers explorer

Showing 1 of 1 citing paper.

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning cs.CV · 2024-04-25 · conditional · none · ref 50
A temporal pooling layer added to LLaVA smooths video feature distributions and lifts performance on dense video captioning and QA to new SOTA levels without extra parameters.

Video-LLaMA: An instruction-tuned audio-visual language model for video understanding

fields

years

verdicts

representative citing papers

citing papers explorer