GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.
Uniter: Learning universal image-text representations
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
A patch-based fusion method extends CLIP to high-resolution images by retaining multi-scale details for improved class-prompted retrieval.
Introduces GRIT, LTMI, and a hierarchical attention framework claiming performance gains on image captioning, visual dialog, and ALFRED instruction following.
citing papers explorer
-
DetailCLIP: Injecting Image Details into CLIP's Feature Space
A patch-based fusion method extends CLIP to high-resolution images by retaining multi-scale details for improved class-prompted retrieval.
-
Machine Intelligence that Understands Visual and Linguistic Information and Interacts with Humans and Environments
Introduces GRIT, LTMI, and a hierarchical attention framework claiming performance gains on image captioning, visual dialog, and ALFRED instruction following.