InstructMoLE replaces per-token routing with instruction-guided global routing for mixture-of-low-rank-experts in diffusion transformers and adds an output-space orthogonality loss to improve multi-conditional image generation.
XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation
6 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 6representative citing papers
VicoEdit performs training-free image editing by transforming source images directly with visual context and concept-alignment-guided posterior sampling, outperforming training-based methods.
Premier learns user-specific embeddings to modulate text-to-image generation, outperforming prior methods on preference alignment, text consistency, and expert ratings even with limited history.
Scone unifies subject understanding and generation in a two-stage trained model to improve both composition and distinction in multi-subject image generation, outperforming prior open-source models on new benchmarks.
OPAD enables reliable high-quality personalization of one-step diffusion models via multi-step teacher distillation combined with adversarial alignment losses.
A data-generation pipeline plus pairwise subject-consistency rewards in RL improve consistency and prompt adherence for multi-subject personalized image generation.
citing papers explorer
-
InstructMoLE: Instruction-Guided Mixture of Low-rank Experts for Multi-Conditional Image Generation
InstructMoLE replaces per-token routing with instruction-guided global routing for mixture-of-low-rank-experts in diffusion transformers and adds an output-space orthogonality loss to improve multi-conditional image generation.
-
Training-Free Image Editing with Visual Context Integration and Concept Alignment
VicoEdit performs training-free image editing by transforming source images directly with visual context and concept-alignment-guided posterior sampling, outperforming training-based methods.
-
Premier: Personalized Preference Modulation with Learnable User Embedding in Text-to-Image Generation
Premier learns user-specific embeddings to modulate text-to-image generation, outperforming prior methods on preference alignment, text consistency, and expert ratings even with limited history.
-
Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
Scone unifies subject understanding and generation in a two-stage trained model to improve both composition and distinction in multi-subject image generation, outperforming prior open-source models on new benchmarks.
-
Adversarial Concept Distillation for One-Step Diffusion Personalization
OPAD enables reliable high-quality personalization of one-step diffusion models via multi-step teacher distillation combined with adversarial alignment losses.
-
PSR: Scaling Multi-Subject Personalized Image Generation with Pairwise Subject-Consistency Rewards
A data-generation pipeline plus pairwise subject-consistency rewards in RL improve consistency and prompt adherence for multi-subject personalized image generation.