InstructMoLE replaces per-token routing with instruction-guided global routing for mixture-of-low-rank-experts in diffusion transformers and adds an output-space orthogonality loss to improve multi-conditional image generation.
Xverse: Consistent multi-subject control of identity and semantic attributes via dit modulation
9 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 9representative citing papers
MIBE introduces a multi-subject interaction benchmark (MIB) with silver and gold sets and a dual-head evaluator (MIE) trained on VLM labels that outperforms baselines in matching human judgments.
Introduces OmniRef-Bench benchmark and DyRef two-stage framework using Difficulty-aware Advantage Reweighting and Discriminative Reward Scaling to improve open-source models on complex multi-reference image generation.
VicoEdit performs training-free image editing by transforming source images directly with visual context and concept-alignment-guided posterior sampling, outperforming training-based methods.
Premier learns user-specific embeddings to modulate text-to-image generation, outperforming prior methods on preference alignment, text consistency, and expert ratings even with limited history.
OPAD enables reliable high-quality personalization of one-step diffusion models via multi-step teacher distillation combined with adversarial alignment losses.
A data-generation pipeline plus pairwise subject-consistency rewards in RL improve consistency and prompt adherence for multi-subject personalized image generation.
UniVerse proposes a unified modulation framework for segmentation-free, disentangled multi-concept personalization in diffusion transformers, claiming superior localization and fidelity over baselines.
citing papers explorer
-
InstructMoLE: Instruction-Guided Mixture of Low-rank Experts for Multi-Conditional Image Generation
InstructMoLE replaces per-token routing with instruction-guided global routing for mixture-of-low-rank-experts in diffusion transformers and adds an output-space orthogonality loss to improve multi-conditional image generation.
-
MIBE: Multi-subject Interaction Benchmark and Evaluator for Personalized Image Generation
MIBE introduces a multi-subject interaction benchmark (MIB) with silver and gold sets and a dual-head evaluator (MIE) trained on VLM labels that outperforms baselines in matching human judgments.
-
Scaling Multi-Reference Image Generation with Dynamic Reward Optimization
Introduces OmniRef-Bench benchmark and DyRef two-stage framework using Difficulty-aware Advantage Reweighting and Discriminative Reward Scaling to improve open-source models on complex multi-reference image generation.
-
Training-Free Image Editing with Visual Context Integration and Concept Alignment
VicoEdit performs training-free image editing by transforming source images directly with visual context and concept-alignment-guided posterior sampling, outperforming training-based methods.
-
Premier: Personalized Preference Modulation with Learnable User Embedding in Text-to-Image Generation
Premier learns user-specific embeddings to modulate text-to-image generation, outperforming prior methods on preference alignment, text consistency, and expert ratings even with limited history.
-
Adversarial Concept Distillation for One-Step Diffusion Personalization
OPAD enables reliable high-quality personalization of one-step diffusion models via multi-step teacher distillation combined with adversarial alignment losses.
-
UniVerse: A Unified Modulation Framework for Segmentation-Free,Disentangled Multi-Concept Personalization
UniVerse proposes a unified modulation framework for segmentation-free, disentangled multi-concept personalization in diffusion transformers, claiming superior localization and fidelity over baselines.