Socratic Models compose zero-shot multimodal reasoning by prompting pretrained language and vision models to exchange information and enable new capabilities without finetuning.
Language models can see: Plugging visual controls in text generation
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
A single model trained only on image-text pairs gains instruction-following ability across images, video, and audio by routing all modalities through ImageBind's shared embedding space into Vicuna.
A holistic survey of affective computing for intelligent agents covering emotion understanding via multimodal data, affective cognition, emotional expression synthesis, key challenges, and future directions emphasizing generative technologies.
citing papers explorer
-
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Socratic Models compose zero-shot multimodal reasoning by prompting pretrained language and vision models to exchange information and enable new capabilities without finetuning.
-
Intelligent Agents with Emotional Intelligence: Current Trends, Challenges, and Future Prospects
A holistic survey of affective computing for intelligent agents covering emotion understanding via multimodal data, affective cognition, emotional expression synthesis, key challenges, and future directions emphasizing generative technologies.