CausalCine enables real-time causal autoregressive multi-shot video generation via multi-shot training, content-aware memory routing for coherence, and distillation to few-step inference.
Laion-5b: An open large-scale dataset for training next generation image-text models.Advances in neural information processing systems, 35:25278–25294
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
UniVL unifies vision and language into one mask-rendered input processed by an OCR backbone to condition diffusion models for spatially grounded image generation without a standalone text encoder.
InsightTok improves text and face fidelity in discrete image tokenization via content-aware perceptual losses, with gains transferring to autoregressive generation.
BAF reduces memorization in diffusion LoRAs by filtering spectral channels of the adaptation weights that show weak alignment with the base model's principal subspace.
Emu3.5 is a native multimodal world model pre-trained on over 10 trillion vision-language tokens with next-token prediction, post-trained via reinforcement learning, and accelerated by Discrete Diffusion Adaptation for efficient interleaved generation and world exploration.
SPACE induces sparsity in cross-attention parameters via closed-form iterative updates to erase target concepts more effectively than dense baselines in large diffusion models.
Proposes Lipschitz regularization during fine-tuning to prevent distributional drift in personalized diffusion models, improving subject fidelity and prompt adherence.
citing papers explorer
-
Emu3.5: Native Multimodal Models are World Learners
Emu3.5 is a native multimodal world model pre-trained on over 10 trillion vision-language tokens with next-token prediction, post-trained via reinforcement learning, and accelerated by Discrete Diffusion Adaptation for efficient interleaved generation and world exploration.
-
Preserve and Personalize: Personalized Text-to-Image Diffusion Models without Distributional Drift
Proposes Lipschitz regularization during fine-tuning to prevent distributional drift in personalized diffusion models, improving subject fidelity and prompt adherence.