LooseRoPE modulates RoPE in diffusion attention maps to continuously trade off between preserving a pasted object's identity and harmonizing it with its new surroundings.
Elite: Encoding visual concepts into textual embeddings for customized text-to-image generation,
6 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 6representative citing papers
AttriStory adds a benchmark and AttriLoss-based latent optimization to improve faithful rendering of fine-grained attributes such as clothing color and texture in diffusion-model visual storytelling.
UniEmo unifies emotional understanding and generation by extracting multi-scale features via learnable expert queries, guiding diffusion-based image generation, and using dual feedback to improve both tasks.
InstantID enables zero-shot identity-preserving image generation from one facial image via a novel IdentityNet that combines strong semantic and weak spatial conditioning with text prompts in diffusion models.
SynMotion combines disentangled semantic embeddings, parameter-efficient motion adapters, and alternate subject-motion training on a new SPV dataset to improve motion customization in text-to-video and image-to-video generation.
SOW uses MLLMs and attention to selectively control unidirectional diffusion for pixel-level fidelity and contextual coherence in text-vision-to-image tasks.
citing papers explorer
-
LooseRoPE: Content-aware Attention Manipulation for Semantic Harmonization
LooseRoPE modulates RoPE in diffusion attention maps to continuously trade off between preserving a pasted object's identity and harmonizing it with its new surroundings.
-
AttriStory: Fine-grained Attribute Realization for Visual Storytelling with Diffusion Models
AttriStory adds a benchmark and AttriLoss-based latent optimization to improve faithful rendering of fine-grained attributes such as clothing color and texture in diffusion-model visual storytelling.
-
UniEmo: Unifying Emotional Understanding and Generation with Learnable Expert Queries
UniEmo unifies emotional understanding and generation by extracting multi-scale features via learnable expert queries, guiding diffusion-based image generation, and using dual feedback to improve both tasks.
-
InstantID: Zero-shot Identity-Preserving Generation in Seconds
InstantID enables zero-shot identity-preserving image generation from one facial image via a novel IdentityNet that combines strong semantic and weak spatial conditioning with text prompts in diffusion models.
-
SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation
SynMotion combines disentangled semantic embeddings, parameter-efficient motion adapters, and alternate subject-motion training on a new SPV dataset to improve motion customization in text-to-video and image-to-video generation.
-
SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation
SOW uses MLLMs and attention to selectively control unidirectional diffusion for pixel-level fidelity and contextual coherence in text-vision-to-image tasks.