X-lxmert: Paint, caption and answer questions with multi-modal transformers

Jaemin Cho, Jiasen Lu, Dustin Schwenk, Hannaneh Hajishirzi, Aniruddha Kembhavi · 2009 · arXiv 2009.11278

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

cs.CV · 2022-06-22 · unverdicted · novelty 6.0

Scaling an autoregressive Transformer to 20B parameters for text-to-image generation using image token sequences achieves new SOTA zero-shot FID of 7.23 and fine-tuned FID of 3.22 on MS-COCO.

citing papers explorer

Showing 1 of 1 citing paper.

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation cs.CV · 2022-06-22 · unverdicted · none · ref 63
Scaling an autoregressive Transformer to 20B parameters for text-to-image generation using image token sequences achieves new SOTA zero-shot FID of 7.23 and fine-tuned FID of 3.22 on MS-COCO.

X-lxmert: Paint, caption and answer questions with multi-modal transformers

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer