Instantbooth: Personalized text-to-image generation without test-time finetuning

· 2023 · arXiv 2304.03411

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

cs.CV · 2023-07-10 · unverdicted · novelty 7.0

A single motion module trained on videos adds temporally coherent animation to any personalized text-to-image model derived from the same base without additional tuning.

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

cs.CV · 2023-10-30 · unverdicted · novelty 6.0

Open-source text-to-video and image-to-video diffusion models generate high-quality 1024x576 videos, with the I2V variant claimed as the first to strictly preserve reference image content.

SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation

cs.CV · 2024-11-28 · unverdicted · novelty 5.0

SOW uses MLLMs and attention to selectively control unidirectional diffusion for pixel-level fidelity and contextual coherence in text-vision-to-image tasks.

citing papers explorer

Showing 3 of 3 citing papers.

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning cs.CV · 2023-07-10 · unverdicted · none · ref 19
A single motion module trained on videos adds temporally coherent animation to any personalized text-to-image model derived from the same base without additional tuning.
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation cs.CV · 2023-10-30 · unverdicted · none · ref 44
Open-source text-to-video and image-to-video diffusion models generate high-quality 1024x576 videos, with the I2V variant claimed as the first to strictly preserve reference image content.
SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation cs.CV · 2024-11-28 · unverdicted · none · ref 48
SOW uses MLLMs and attention to selectively control unidirectional diffusion for pixel-level fidelity and contextual coherence in text-vision-to-image tasks.

Instantbooth: Personalized text-to-image generation without test-time finetuning

fields

years

verdicts

representative citing papers

citing papers explorer