Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, Maneesh Agrawala · 2023

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

browse 9 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

NeuroFlow: Toward Unified Visual Encoding and Decoding from Neural Activity

cs.LG · 2026-04-10 · unverdicted · novelty 7.0

NeuroFlow is the first unified flow model for bidirectional visual encoding and decoding from neural activity using NeuroVAE and cross-modal flow matching.

ProDiG: Progressive Diffusion-Guided Gaussian Splatting for Aerial to Ground Reconstruction

cs.CV · 2026-04-02 · unverdicted · novelty 7.0

ProDiG progressively transforms aerial Gaussian splats into coherent ground-level 3D reconstructions via diffusion guidance and specialized attention modules.

Face2Scene: Using Facial Degradation as an Oracle for Diffusion-Based Scene Restoration

cs.CV · 2026-03-17 · unverdicted · novelty 7.0

Face2Scene uses facial restoration as an oracle to derive degradation codes that condition a diffusion model for restoring the entire degraded scene.

LooseRoPE: Content-aware Attention Manipulation for Semantic Harmonization

cs.GR · 2026-01-08 · unverdicted · novelty 7.0

LooseRoPE modulates RoPE in diffusion attention maps to continuously trade off between preserving a pasted object's identity and harmonizing it with its new surroundings.

One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer

cs.CV · 2025-11-28 · unverdicted · novelty 7.0

One-to-All Animation enables alignment-free character animation and image pose transfer via self-supervised outpainting reformulation, reference extraction, hybrid fusion attention, identity-robust pose control, and token replacement for long videos.

Towards Photorealistic and Efficient Bokeh Rendering via Diffusion Framework

cs.CV · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

MagicBokeh uses a single diffusion model with alternative training, focus-aware masked attention, and degradation-aware depth estimation to produce photorealistic bokeh on low-res zoomed images.

Premier: Personalized Preference Modulation with Learnable User Embedding in Text-to-Image Generation

cs.CV · 2026-03-21 · unverdicted · novelty 6.0

Premier learns user-specific embeddings to modulate text-to-image generation, outperforming prior methods on preference alignment, text consistency, and expert ratings even with limited history.

Decomposing Subject-Driven Image Generation via Intermediate Structural Prediction

cs.CV · 2026-05-20 · unverdicted · novelty 5.0

A two-stage method predicts an intermediate Canny map for structure then renders the image conditioned on appearance and structure, paired with a 100k text-aware dataset, to improve detail preservation in subject-driven generation.

BIR-Adapter: A parameter-efficient diffusion adapter for blind image restoration

cs.CV · 2025-09-08 · unverdicted · novelty 5.0

BIR-Adapter adds a parameter-efficient attention adapter and guided sampling to pretrained diffusion models, achieving competitive blind image restoration performance with up to 36x fewer trained parameters and enabling extension to new degradation types.

citing papers explorer

Showing 9 of 9 citing papers.

NeuroFlow: Toward Unified Visual Encoding and Decoding from Neural Activity cs.LG · 2026-04-10 · unverdicted · none · ref 69
NeuroFlow is the first unified flow model for bidirectional visual encoding and decoding from neural activity using NeuroVAE and cross-modal flow matching.
ProDiG: Progressive Diffusion-Guided Gaussian Splatting for Aerial to Ground Reconstruction cs.CV · 2026-04-02 · unverdicted · none · ref 47
ProDiG progressively transforms aerial Gaussian splats into coherent ground-level 3D reconstructions via diffusion guidance and specialized attention modules.
Face2Scene: Using Facial Degradation as an Oracle for Diffusion-Based Scene Restoration cs.CV · 2026-03-17 · unverdicted · none · ref 62
Face2Scene uses facial restoration as an oracle to derive degradation codes that condition a diffusion model for restoring the entire degraded scene.
LooseRoPE: Content-aware Attention Manipulation for Semantic Harmonization cs.GR · 2026-01-08 · unverdicted · none · ref 47
LooseRoPE modulates RoPE in diffusion attention maps to continuously trade off between preserving a pasted object's identity and harmonizing it with its new surroundings.
One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer cs.CV · 2025-11-28 · unverdicted · none · ref 59
One-to-All Animation enables alignment-free character animation and image pose transfer via self-supervised outpainting reformulation, reference extraction, hybrid fusion attention, identity-robust pose control, and token replacement for long videos.
Towards Photorealistic and Efficient Bokeh Rendering via Diffusion Framework cs.CV · 2026-05-08 · unverdicted · none · ref 61 · 2 links
MagicBokeh uses a single diffusion model with alternative training, focus-aware masked attention, and degradation-aware depth estimation to produce photorealistic bokeh on low-res zoomed images.
Premier: Personalized Preference Modulation with Learnable User Embedding in Text-to-Image Generation cs.CV · 2026-03-21 · unverdicted · none · ref 47
Premier learns user-specific embeddings to modulate text-to-image generation, outperforming prior methods on preference alignment, text consistency, and expert ratings even with limited history.
Decomposing Subject-Driven Image Generation via Intermediate Structural Prediction cs.CV · 2026-05-20 · unverdicted · none · ref 29
A two-stage method predicts an intermediate Canny map for structure then renders the image conditioned on appearance and structure, paired with a 100k text-aware dataset, to improve detail preservation in subject-driven generation.
BIR-Adapter: A parameter-efficient diffusion adapter for blind image restoration cs.CV · 2025-09-08 · unverdicted · none · ref 38
BIR-Adapter adds a parameter-efficient attention adapter and guided sampling to pretrained diffusion models, achieving competitive blind image restoration performance with up to 36x fewer trained parameters and enabling extension to new degradation types.

Adding conditional control to text-to-image diffusion models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer