Elite: Encoding visual concepts into textual embeddings for customized text-to-image generation,

· 2023 · arXiv 2302.13848

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

LooseRoPE: Content-aware Attention Manipulation for Semantic Harmonization

cs.GR · 2026-01-08 · unverdicted · novelty 7.0

LooseRoPE modulates RoPE in diffusion attention maps to continuously trade off between preserving a pasted object's identity and harmonizing it with its new surroundings.

AttriStory: Fine-grained Attribute Realization for Visual Storytelling with Diffusion Models

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

AttriStory adds a benchmark and AttriLoss-based latent optimization to improve faithful rendering of fine-grained attributes such as clothing color and texture in diffusion-model visual storytelling.

UniEmo: Unifying Emotional Understanding and Generation with Learnable Expert Queries

cs.CV · 2025-07-31 · unverdicted · novelty 6.0

UniEmo unifies emotional understanding and generation by extracting multi-scale features via learnable expert queries, guiding diffusion-based image generation, and using dual feedback to improve both tasks.

InstantID: Zero-shot Identity-Preserving Generation in Seconds

cs.CV · 2024-01-15 · unverdicted · novelty 6.0

InstantID enables zero-shot identity-preserving image generation from one facial image via a novel IdentityNet that combines strong semantic and weak spatial conditioning with text prompts in diffusion models.

SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation

cs.CV · 2025-06-30 · unverdicted · novelty 5.0

SynMotion combines disentangled semantic embeddings, parameter-efficient motion adapters, and alternate subject-motion training on a new SPV dataset to improve motion customization in text-to-video and image-to-video generation.

SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation

cs.CV · 2024-11-28 · unverdicted · novelty 5.0

SOW uses MLLMs and attention to selectively control unidirectional diffusion for pixel-level fidelity and contextual coherence in text-vision-to-image tasks.

citing papers explorer

Showing 6 of 6 citing papers.

LooseRoPE: Content-aware Attention Manipulation for Semantic Harmonization cs.GR · 2026-01-08 · unverdicted · none · ref 43
LooseRoPE modulates RoPE in diffusion attention maps to continuously trade off between preserving a pasted object's identity and harmonizing it with its new surroundings.
AttriStory: Fine-grained Attribute Realization for Visual Storytelling with Diffusion Models cs.CV · 2026-05-20 · unverdicted · none · ref 36
AttriStory adds a benchmark and AttriLoss-based latent optimization to improve faithful rendering of fine-grained attributes such as clothing color and texture in diffusion-model visual storytelling.
UniEmo: Unifying Emotional Understanding and Generation with Learnable Expert Queries cs.CV · 2025-07-31 · unverdicted · none · ref 50
UniEmo unifies emotional understanding and generation by extracting multi-scale features via learnable expert queries, guiding diffusion-based image generation, and using dual feedback to improve both tasks.
InstantID: Zero-shot Identity-Preserving Generation in Seconds cs.CV · 2024-01-15 · unverdicted · none · ref 22
InstantID enables zero-shot identity-preserving image generation from one facial image via a novel IdentityNet that combines strong semantic and weak spatial conditioning with text prompts in diffusion models.
SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation cs.CV · 2025-06-30 · unverdicted · none · ref 88
SynMotion combines disentangled semantic embeddings, parameter-efficient motion adapters, and alternate subject-motion training on a new SPV dataset to improve motion customization in text-to-video and image-to-video generation.
SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation cs.CV · 2024-11-28 · unverdicted · none · ref 42
SOW uses MLLMs and attention to selectively control unidirectional diffusion for pixel-level fidelity and contextual coherence in text-vision-to-image tasks.

Elite: Encoding visual concepts into textual embeddings for customized text-to-image generation,

fields

years

verdicts

representative citing papers

citing papers explorer