Unicontrol: A unified diffusion model for controllable visual generation in the wild

Qin, C · 2023 · arXiv 2305.11147

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

MetaEarth-MM: Unified Multimodal Remote Sensing Image Generation with Scene-centered Joint Modeling

cs.CV · 2026-05-19 · conditional · novelty 7.0

MetaEarth-MM unifies multi-modal remote sensing image generation and any-to-any translation across five modalities via scene-centered joint modeling on the new EarthMM dataset.

InstanceControl: Controllable Complex Image Generation without Instance Labeling

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

InstanceControl uses VLMs to auto-generate instance masks from text and visual conditions, with adaptive refinement, to enable controllable multi-object image generation without manual labeling.

OmniGen-AR: AutoRegressive Any-to-Image Generation

cs.CV · 2026-06-08 · unverdicted · novelty 6.0

OmniGen-AR is a unified autoregressive framework for any-to-image generation that tokenizes text and visual conditions together and uses disentangled causal attention to support tasks like text-to-image, depth-to-image, image editing, and text-to-video while reporting 0.63 on GenEval and 80.02 on VB

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

cs.CV · 2026-05-01 · unverdicted · novelty 6.0

UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.

SpatialFusion: Endowing Unified Image Generation with Intrinsic 3D Geometric Awareness

cs.CV · 2026-04-29 · unverdicted · novelty 6.0

SpatialFusion internalizes 3D geometric awareness into unified image generation models by pairing an MLLM with a spatial transformer that produces depth maps to constrain diffusion generation.

PiERN: Token-Level Routing for Integrating High-Precision Computation and Reasoning

cs.LG · 2025-09-17 · unverdicted · novelty 6.0

PiERN proposes token-level routing of physically-isolated experts to embed high-precision computation directly into LLMs, reporting higher accuracy and lower latency, token count, and energy use than fine-tuning or multi-agent baselines.

Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering

cs.CV · 2025-08-20 · unverdicted · novelty 6.0

Ouroboros uses two single-step diffusion models with cycle consistency for forward and inverse rendering, extending intrinsic decomposition to indoor/outdoor scenes with faster inference than multi-step methods.

ProductConsistency: Improving Product Identity Preservation in Instruction-Based Image Editing via SFT and RL

cs.CV · 2026-06-17 · unverdicted · novelty 5.0

Introduces ProductConsistency dataset, benchmark, and Cyclic Consistency reward to fine-tune image editing models, achieving a 5x reduction in character error rate for product identity preservation.

IdentiFace: Multi-Modal Iterative Diffusion Framework for Identifiable Suspect Face Generation in Crime Investigations

cs.CV · 2026-05-01 · unverdicted · novelty 5.0

IdentiFace is a multi-modal iterative diffusion framework that generates identifiable suspect faces with improved identity retrieval for law enforcement applications.

UNITY: Attention Flow Networks for Adaptive Conditioning in Diffusion

cs.CV · 2026-06-18 · unverdicted · novelty 4.0

UNITY is a two-stage adapter with Morphable Attention Flow networks for efficient single and composite conditioning in diffusion-based image generation.

citing papers explorer

Showing 9 of 9 citing papers after filters.

InstanceControl: Controllable Complex Image Generation without Instance Labeling cs.CV · 2026-06-30 · unverdicted · none · ref 36
InstanceControl uses VLMs to auto-generate instance masks from text and visual conditions, with adaptive refinement, to enable controllable multi-object image generation without manual labeling.
OmniGen-AR: AutoRegressive Any-to-Image Generation cs.CV · 2026-06-08 · unverdicted · none · ref 54
OmniGen-AR is a unified autoregressive framework for any-to-image generation that tokenizes text and visual conditions together and uses disentangled causal attention to support tasks like text-to-image, depth-to-image, image editing, and text-to-video while reporting 0.63 on GenEval and 80.02 on VB
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors cs.CV · 2026-05-01 · unverdicted · none · ref 54
UniVidX unifies diverse video generation tasks into one conditional diffusion model using stochastic condition masking, decoupled gated LoRAs, and cross-modal self-attention.
SpatialFusion: Endowing Unified Image Generation with Intrinsic 3D Geometric Awareness cs.CV · 2026-04-29 · unverdicted · none · ref 39
SpatialFusion internalizes 3D geometric awareness into unified image generation models by pairing an MLLM with a spatial transformer that produces depth maps to constrain diffusion generation.
PiERN: Token-Level Routing for Integrating High-Precision Computation and Reasoning cs.LG · 2025-09-17 · unverdicted · none · ref 11
PiERN proposes token-level routing of physically-isolated experts to embed high-precision computation directly into LLMs, reporting higher accuracy and lower latency, token count, and energy use than fine-tuning or multi-agent baselines.
Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering cs.CV · 2025-08-20 · unverdicted · none · ref 54
Ouroboros uses two single-step diffusion models with cycle consistency for forward and inverse rendering, extending intrinsic decomposition to indoor/outdoor scenes with faster inference than multi-step methods.
ProductConsistency: Improving Product Identity Preservation in Instruction-Based Image Editing via SFT and RL cs.CV · 2026-06-17 · unverdicted · none · ref 39
Introduces ProductConsistency dataset, benchmark, and Cyclic Consistency reward to fine-tune image editing models, achieving a 5x reduction in character error rate for product identity preservation.
IdentiFace: Multi-Modal Iterative Diffusion Framework for Identifiable Suspect Face Generation in Crime Investigations cs.CV · 2026-05-01 · unverdicted · none · ref 26
IdentiFace is a multi-modal iterative diffusion framework that generates identifiable suspect faces with improved identity retrieval for law enforcement applications.
UNITY: Attention Flow Networks for Adaptive Conditioning in Diffusion cs.CV · 2026-06-18 · unverdicted · none · ref 20
UNITY is a two-stage adapter with Morphable Attention Flow networks for efficient single and composite conditioning in diffusion-based image generation.

Unicontrol: A unified diffusion model for controllable visual generation in the wild

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer