pith. sign in

arxiv: 2503.23353 · v1 · pith:VR5NTDTDnew · submitted 2025-03-30 · 💻 cs.CV · cs.AI

Object Isolated Attention for Consistent Story Visualization

classification 💻 cs.CV cs.AI
keywords attentioncharacterconsistencyisolatedcrossfeaturesmechanismmethods
0
0 comments X
read the original abstract

Open-ended story visualization is a challenging task that involves generating coherent image sequences from a given storyline. One of the main difficulties is maintaining character consistency while creating natural and contextually fitting scenes--an area where many existing methods struggle. In this paper, we propose an enhanced Transformer module that uses separate self attention and cross attention mechanisms, leveraging prior knowledge from pre-trained diffusion models to ensure logical scene creation. The isolated self attention mechanism improves character consistency by refining attention maps to reduce focus on irrelevant areas and highlight key features of the same character. Meanwhile, the isolated cross attention mechanism independently processes each character's features, avoiding feature fusion and further strengthening consistency. Notably, our method is training-free, allowing the continuous generation of new characters and storylines without re-tuning. Both qualitative and quantitative evaluations show that our approach outperforms current methods, demonstrating its effectiveness.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LCG: Long-Context Consistent Image Generation with Sparse Relational Attention

    cs.CV 2026-06 unverdicted novelty 5.0

    LCG introduces Sparse Relational Attention and Routing Consistency Constraint to produce consistent multi-image text-to-image sequences and reports better prompt alignment and character consistency than baselines on a...