pith. sign in

Canonical reference

Visual instruction tuning

Canonical reference. 80% of citing Pith papers cite this work as background.

7 Pith papers citing it
Background 80% of classified citations

citation-role summary

background 5

citation-polarity summary

roles

background 5

polarities

background 4 support 1

representative citing papers

Emu3: Next-Token Prediction is All You Need

cs.CV · 2024-09-27 · unverdicted · novelty 6.0

Emu3 shows that next-token prediction on a unified discrete token space for text, images, and video lets a single transformer outperform task-specific models such as SDXL and LLaVA-1.6 in multimodal generation and perception.

citing papers explorer

Showing 7 of 7 citing papers.