Socratic Models compose zero-shot multimodal reasoning by prompting pretrained language and vision models to exchange information and enable new capabilities without finetuning.
arXiv preprint arXiv:2010.07999 (2020)
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
baseline 1
citation-polarity summary
fields
cs.CV 2verdicts
UNVERDICTED 2roles
baseline 1polarities
baseline 1representative citing papers
VISTA mines multi-level event semantics via visual prompts, knowledge-enhanced retrieval, and proposal integration to improve long-video event prediction over existing LVLMs.
citing papers explorer
-
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Socratic Models compose zero-shot multimodal reasoning by prompting pretrained language and vision models to exchange information and enable new capabilities without finetuning.