VINS-120K supplies the first large-scale set of instruction-image-edited-image triplets at ultra-high resolution together with an adaptation strategy that improves detail synthesis.
Masactrl: Tuning-free mu- tual self-attention control for consistent image synthesis and editing
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Retrieval from motion datasets combined with LLM task parsing and reward-guided noise initialization enables training-free diffusion optimization to satisfy severe spatiotemporal constraints in human motion generation.
AttriStory adds a benchmark and AttriLoss-based latent optimization to improve faithful rendering of fine-grained attributes such as clothing color and texture in diffusion-model visual storytelling.
citing papers explorer
-
VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset
VINS-120K supplies the first large-scale set of instruction-image-edited-image triplets at ultra-high resolution together with an adaptation strategy that improves detail synthesis.
-
Towards Highly-Constrained Human Motion Generation with Retrieval-Guided Diffusion Noise Optimization
Retrieval from motion datasets combined with LLM task parsing and reward-guided noise initialization enables training-free diffusion optimization to satisfy severe spatiotemporal constraints in human motion generation.
-
AttriStory: Fine-grained Attribute Realization for Visual Storytelling with Diffusion Models
AttriStory adds a benchmark and AttriLoss-based latent optimization to improve faithful rendering of fine-grained attributes such as clothing color and texture in diffusion-model visual storytelling.