[2021]: LV AE= L1 + LLPIPS + 0.5LGAN + 0.2LID + 0.000001LKL where L1 is L1 loss in pixel space, LLPIPS is perceptual loss based on LPIPS similarity Zhang et al

19 A Autoencoder Details The training objective for our V AE closely follows that of Esser et al · 2021

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

cs.AI · 2024-08-20 · unverdicted · novelty 6.0

A single transformer combines language modeling loss and diffusion loss on mixed-modality data, scaling to 7B parameters and 2T tokens while matching specialized language and diffusion models.

citing papers explorer

Showing 1 of 1 citing paper.

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model cs.AI · 2024-08-20 · unverdicted · none · ref 26
A single transformer combines language modeling loss and diffusion loss on mixed-modality data, scaling to 7B parameters and 2T tokens while matching specialized language and diffusion models.

[2021]: LV AE= L1 + LLPIPS + 0.5LGAN + 0.2LID + 0.000001LKL where L1 is L1 loss in pixel space, LLPIPS is perceptual loss based on LPIPS similarity Zhang et al

fields

years

verdicts

representative citing papers

citing papers explorer