Fashion130K dataset and UMC framework align text and visual prompts to generate more consistent fashion outfits than prior state-of-the-art methods.
arXiv preprint arXiv:2403.10783 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
VersaVogue unifies garment generation and virtual dressing via trait-routing attention with mixture-of-experts and an automated multi-perspective preference optimization pipeline that uses DPO without human labels.
Introduces dual pose-image representation, cross-modal alignment, and iterative construction to improve prompt alignment and diversity in multi-person text-to-image generation.
citing papers explorer
-
Fashion130K: An E-commerce Fashion Dataset for Outfit Generation with Unified Multi-modal Condition
Fashion130K dataset and UMC framework align text and visual prompts to generate more consistent fashion outfits than prior state-of-the-art methods.
-
VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis
VersaVogue unifies garment generation and virtual dressing via trait-routing attention with mixture-of-experts and an automated multi-perspective preference optimization pipeline that uses DPO without human labels.
-
Composing People Together: Iterative Pose-Image Generation for Multi-Person Interaction Scenes
Introduces dual pose-image representation, cross-modal alignment, and iterative construction to improve prompt alignment and diversity in multi-person text-to-image generation.