pith. sign in

Disentangled representa- tion learning for text-video retrieval

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

fields

cs.CV 3 cs.IR 2

verdicts

UNVERDICTED 5

roles

background 2

polarities

background 2

representative citing papers

PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance

cs.CV · 2024-11-04 · unverdicted · novelty 6.0

PPLLaVA uses CLIP-based alignment and prompt-guided convolution-style pooling to reduce visual tokens 18x in Video LLMs, achieving SOTA results on captioning, QA, and long-form reasoning benchmarks with higher throughput.

citing papers explorer

Showing 5 of 5 citing papers.