Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, Maneesh Agrawala · 2023

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

From Articulated Kinematics to Routed Visual Control for Action-Conditioned Surgical Video Generation

cs.CV · 2026-05-09 · unverdicted · novelty 7.0

A kinematic-to-visual lifting paradigm combined with hierarchically routed control generates action-conditioned surgical videos with better faithfulness, fidelity, and efficiency.

DiffST: Spatiotemporal-Aware Diffusion for Real-World Space-Time Video Super-Resolution

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

DiffST delivers state-of-the-art real-world space-time video super-resolution with 17x faster inference than prior diffusion methods by using one-step sampling, cross-frame context aggregation, and video representation guidance.

FlashClear: Ultra-Fast Image Content Removal via Efficient Step Distillation and Feature Caching

cs.CV · 2026-05-09 · unverdicted · novelty 6.0 · 2 refs

FlashClear delivers up to 122x faster object removal than prior diffusion models via adversarial step distillation and asymmetric attention caching while preserving visual quality.

citing papers explorer

Showing 3 of 3 citing papers.

From Articulated Kinematics to Routed Visual Control for Action-Conditioned Surgical Video Generation cs.CV · 2026-05-09 · unverdicted · none · ref 105
A kinematic-to-visual lifting paradigm combined with hierarchically routed control generates action-conditioned surgical videos with better faithfulness, fidelity, and efficiency.
DiffST: Spatiotemporal-Aware Diffusion for Real-World Space-Time Video Super-Resolution cs.CV · 2026-05-13 · unverdicted · none · ref 58
DiffST delivers state-of-the-art real-world space-time video super-resolution with 17x faster inference than prior diffusion methods by using one-step sampling, cross-frame context aggregation, and video representation guidance.
FlashClear: Ultra-Fast Image Content Removal via Efficient Step Distillation and Feature Caching cs.CV · 2026-05-09 · unverdicted · none · ref 59 · 2 links
FlashClear delivers up to 122x faster object removal than prior diffusion models via adversarial step distillation and asymmetric attention caching while preserving visual quality.

Adding conditional control to text-to-image diffusion models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer