A new data pipeline using real photos, entity removal, and image-to-video models plus a cross-view attention loss enables text-driven generation of actors in reference scenes with improved alignment.
Scenedecorator: Towards scene-oriented story generation with scene planning and scene consistency
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 3roles
background 1polarities
background 1representative citing papers
OmniShow unifies text, image, audio, and pose conditions into an end-to-end model for high-quality human-object interaction video generation and introduces the HOIVG-Bench benchmark, claiming state-of-the-art results.
HiFi-Inpaint delivers state-of-the-art detail-preserving human-product images by adding Shared Enhancement Attention and Detail-Aware Loss to reference-based inpainting on a new 40K dataset.
citing papers explorer
-
Setting the Stage: Text-Driven Scene-Consistent Image Generation
A new data pipeline using real photos, entity removal, and image-to-video models plus a cross-view attention loss enables text-driven generation of actors in reference scenes with improved alignment.
-
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation
OmniShow unifies text, image, audio, and pose conditions into an end-to-end model for high-quality human-object interaction video generation and introduces the HOIVG-Bench benchmark, claiming state-of-the-art results.
-
HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images
HiFi-Inpaint delivers state-of-the-art detail-preserving human-product images by adding Shared Enhancement Attention and Detail-Aware Loss to reference-based inpainting on a new 40K dataset.