Emu is a multimodal foundation model that unifies image and text generation via autoregressive pretraining on interleaved multimodal data, showing strong zero-shot performance on captioning, VQA, and text-to-image tasks.
Make sure to check the weather forecast before your visit and pack appropriate clothing and gear
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2023 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Emu: Generative Pretraining in Multimodality
Emu is a multimodal foundation model that unifies image and text generation via autoregressive pretraining on interleaved multimodal data, showing strong zero-shot performance on captioning, VQA, and text-to-image tasks.