hub

Less-to-more generalization: Unlocking more controllability by in-context generation

Shaojin Wu, Mengqi Huang, Wenxu Wu, Yufeng Cheng, Fei Ding, Qian He, “Less-to-more generalization: Unlocking more controllability by in-context generation,”arXiv preprint arXiv:2504 · 2025 · arXiv 2504.02160

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 baseline 1

citation-polarity summary

background 3 baseline 1

representative citing papers

Towards In-Context Tone Style Transfer with A Large-Scale Triplet Dataset

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

A new 100k triplet dataset and in-context diffusion framework ICTone enable state-of-the-art tone style transfer by jointly conditioning on content and reference images with scorer-based reward learning.

ASTRA: Enhancing Multi-Subject Generation with Retrieval-Augmented Pose Guidance and Disentangled Position Embedding

cs.CV · 2026-04-15 · unverdicted · novelty 7.0

ASTRA disentangles subject identity from pose structure in diffusion transformers via retrieval-augmented pose guidance, asymmetric EURoPE embeddings, and a DSM adapter to improve multi-subject generation.

DSH-Bench: A Difficulty- and Scenario-Aware Benchmark with Hierarchical Subject Taxonomy for Subject-Driven Text-to-Image Generation

cs.CV · 2026-03-09 · unverdicted · novelty 7.0

DSH-Bench is a benchmark for subject-driven T2I generation that uses hierarchical taxonomy sampling, difficulty/scenario classification, and a new SICS metric showing 9.4% higher human correlation than prior measures.

InstructMoLE: Instruction-Guided Mixture of Low-rank Experts for Multi-Conditional Image Generation

cs.CV · 2025-12-25 · unverdicted · novelty 7.0

InstructMoLE replaces per-token routing with instruction-guided global routing for mixture-of-low-rank-experts in diffusion transformers and adds an output-space orthogonality loss to improve multi-conditional image generation.

In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer

cs.CV · 2025-04-29 · unverdicted · novelty 7.0

ICEdit achieves state-of-the-art instructional image editing in Diffusion Transformers via in-context generation, requiring only 0.1% of prior training data and 1% trainable parameters.

Lance: Unified Multimodal Modeling by Multi-Task Synergy

cs.CV · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

Lance presents a dual-stream mixture-of-experts model with modality-aware positional encoding and staged multi-task training that outperforms prior open-source unified models on image and video generation while keeping strong understanding performance.

Fashion130K: An E-commerce Fashion Dataset for Outfit Generation with Unified Multi-modal Condition

cs.CV · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

Fashion130K dataset and UMC framework align text and visual prompts to generate more consistent fashion outfits than prior state-of-the-art methods.

Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

Vanast produces coherent garment-transferred human animation videos from a single human image, garment images, and pose guidance video using synthetic triplet supervision and a Dual Module video diffusion transformer architecture.

Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling

cs.CV · 2025-12-14 · conditional · novelty 6.0

Scone unifies subject understanding and generation in a two-stage trained model to improve both composition and distinction in multi-subject image generation, outperforming prior open-source models on new benchmarks.

Emu3.5: Native Multimodal Models are World Learners

cs.CV · 2025-10-30 · unverdicted · novelty 6.0

Emu3.5 is a native multimodal world model pre-trained on over 10 trillion vision-language tokens with next-token prediction, post-trained via reinforcement learning, and accelerated by Discrete Diffusion Adaptation for efficient interleaved generation and world exploration.

Adversarial Concept Distillation for One-Step Diffusion Personalization

cs.CV · 2025-10-23 · unverdicted · novelty 6.0

OPAD enables reliable high-quality personalization of one-step diffusion models via multi-step teacher distillation combined with adversarial alignment losses.

FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation

cs.CV · 2025-04-22 · unverdicted · novelty 6.0

FreeGraftor performs subject-driven text-to-image generation without training by cross-image feature grafting via semantic matching, position-constrained attention fusion, and a noise initialization strategy that preserves reference geometry.

ID-Sim: An Identity-Focused Similarity Metric

cs.CV · 2026-04-06 · unverdicted · novelty 5.0

ID-Sim is a new similarity metric that aims to capture human selective sensitivity to identities by training on curated real and generative synthetic data and validating against human annotations on recognition, retrieval, and generative tasks.

PureCC: Pure Learning for Text-to-Image Concept Customization

cs.CV · 2026-03-08 · unverdicted · novelty 5.0

PureCC introduces a decoupled learning objective, dual-branch training pipeline with frozen extractor, and adaptive guidance scale λ* for high-fidelity concept customization while preserving original model behavior in text-to-image generation.

PSR: Scaling Multi-Subject Personalized Image Generation with Pairwise Subject-Consistency Rewards

cs.CV · 2025-12-01 · conditional · novelty 5.0

A data-generation pipeline plus pairwise subject-consistency rewards in RL improve consistency and prompt adherence for multi-subject personalized image generation.

OmniGen2: Towards Instruction-Aligned Multimodal Generation

cs.CV · 2025-06-23 · unverdicted · novelty 5.0

OmniGen2 introduces a unified generative model with two distinct decoding pathways and a decoupled image tokenizer that achieves competitive results on text-to-image and editing benchmarks plus state-of-the-art consistency among open-source models on the new OmniContext benchmark.

citing papers explorer

Showing 16 of 16 citing papers.

Towards In-Context Tone Style Transfer with A Large-Scale Triplet Dataset cs.CV · 2026-04-17 · unverdicted · none · ref 44
A new 100k triplet dataset and in-context diffusion framework ICTone enable state-of-the-art tone style transfer by jointly conditioning on content and reference images with scorer-based reward learning.
ASTRA: Enhancing Multi-Subject Generation with Retrieval-Augmented Pose Guidance and Disentangled Position Embedding cs.CV · 2026-04-15 · unverdicted · none · ref 43
ASTRA disentangles subject identity from pose structure in diffusion transformers via retrieval-augmented pose guidance, asymmetric EURoPE embeddings, and a DSM adapter to improve multi-subject generation.
DSH-Bench: A Difficulty- and Scenario-Aware Benchmark with Hierarchical Subject Taxonomy for Subject-Driven Text-to-Image Generation cs.CV · 2026-03-09 · unverdicted · none · ref 67
DSH-Bench is a benchmark for subject-driven T2I generation that uses hierarchical taxonomy sampling, difficulty/scenario classification, and a new SICS metric showing 9.4% higher human correlation than prior measures.
InstructMoLE: Instruction-Guided Mixture of Low-rank Experts for Multi-Conditional Image Generation cs.CV · 2025-12-25 · unverdicted · none · ref 34
InstructMoLE replaces per-token routing with instruction-guided global routing for mixture-of-low-rank-experts in diffusion transformers and adds an output-space orthogonality loss to improve multi-conditional image generation.
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer cs.CV · 2025-04-29 · unverdicted · none · ref 28
ICEdit achieves state-of-the-art instructional image editing in Diffusion Transformers via in-context generation, requiring only 0.1% of prior training data and 1% trainable parameters.
Lance: Unified Multimodal Modeling by Multi-Task Synergy cs.CV · 2026-05-18 · unverdicted · none · ref 128 · 2 links
Lance presents a dual-stream mixture-of-experts model with modality-aware positional encoding and staged multi-task training that outperforms prior open-source unified models on image and video generation while keeping strong understanding performance.
Fashion130K: An E-commerce Fashion Dataset for Outfit Generation with Unified Multi-modal Condition cs.CV · 2026-05-11 · unverdicted · none · ref 51 · 2 links
Fashion130K dataset and UMC framework align text and visual prompts to generate more consistent fashion outfits than prior state-of-the-art methods.
Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision cs.CV · 2026-04-06 · unverdicted · none · ref 31
Vanast produces coherent garment-transferred human animation videos from a single human image, garment images, and pose guidance video using synthetic triplet supervision and a Dual Module video diffusion transformer architecture.
Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling cs.CV · 2025-12-14 · conditional · none · ref 39
Scone unifies subject understanding and generation in a two-stage trained model to improve both composition and distinction in multi-subject image generation, outperforming prior open-source models on new benchmarks.
Emu3.5: Native Multimodal Models are World Learners cs.CV · 2025-10-30 · unverdicted · none · ref 109
Emu3.5 is a native multimodal world model pre-trained on over 10 trillion vision-language tokens with next-token prediction, post-trained via reinforcement learning, and accelerated by Discrete Diffusion Adaptation for efficient interleaved generation and world exploration.
Adversarial Concept Distillation for One-Step Diffusion Personalization cs.CV · 2025-10-23 · unverdicted · none · ref 98
OPAD enables reliable high-quality personalization of one-step diffusion models via multi-step teacher distillation combined with adversarial alignment losses.
FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation cs.CV · 2025-04-22 · unverdicted · none · ref 11
FreeGraftor performs subject-driven text-to-image generation without training by cross-image feature grafting via semantic matching, position-constrained attention fusion, and a noise initialization strategy that preserves reference geometry.
ID-Sim: An Identity-Focused Similarity Metric cs.CV · 2026-04-06 · unverdicted · none · ref 93
ID-Sim is a new similarity metric that aims to capture human selective sensitivity to identities by training on curated real and generative synthetic data and validating against human annotations on recognition, retrieval, and generative tasks.
PureCC: Pure Learning for Text-to-Image Concept Customization cs.CV · 2026-03-08 · unverdicted · none · ref 46
PureCC introduces a decoupled learning objective, dual-branch training pipeline with frozen extractor, and adaptive guidance scale λ* for high-fidelity concept customization while preserving original model behavior in text-to-image generation.
PSR: Scaling Multi-Subject Personalized Image Generation with Pairwise Subject-Consistency Rewards cs.CV · 2025-12-01 · conditional · none · ref 40
A data-generation pipeline plus pairwise subject-consistency rewards in RL improve consistency and prompt adherence for multi-subject personalized image generation.
OmniGen2: Towards Instruction-Aligned Multimodal Generation cs.CV · 2025-06-23 · unverdicted · none · ref 79
OmniGen2 introduces a unified generative model with two distinct decoding pathways and a decoupled image tokenizer that achieves competitive results on text-to-image and editing benchmarks plus state-of-the-art consistency among open-source models on the new OmniContext benchmark.

Less-to-more generalization: Unlocking more controllability by in-context generation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer