Pioneers ViPro, the first attack to adversarially promote videos in text-to-video retrieval, using Modal Refinement to improve black-box transferability across multiple targets.
Clip2tv: An empirical study on transformer-based methods for video-text retrieval
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 2polarities
background 2representative citing papers
Socratic Models compose zero-shot multimodal reasoning by prompting pretrained language and vision models to exchange information and enable new capabilities without finetuning.
Short, simple captions describing single actions achieve higher retrieval recall than complex multi-step or fine-grained scene descriptions across all tested models.
citing papers explorer
-
Adversarial Video Promotion Against Text-to-Video Retrieval
Pioneers ViPro, the first attack to adversarially promote videos in text-to-video retrieval, using Modal Refinement to improve black-box transferability across multiple targets.
-
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Socratic Models compose zero-shot multimodal reasoning by prompting pretrained language and vision models to exchange information and enable new capabilities without finetuning.
-
Understanding the Performance Plateau in Text-to-Video Retrieval: A Comprehensive Empirical and Linguistic Analysis
Short, simple captions describing single actions achieve higher retrieval recall than complex multi-step or fine-grained scene descriptions across all tested models.