Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, Maneesh Agrawala · 2023

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

FreeOrbit4D: Training-Free Arbitrary Camera Redirection for Monocular Videos via Foreground-Complete 4D Reconstruction

cs.CV · 2026-01-26 · unverdicted · novelty 7.0

FreeOrbit4D recovers a foreground-complete 4D proxy via decoupled background and object-centric reconstruction to provide geometric guidance for large-angle camera redirection in monocular videos using conditional video diffusion.

MapAnything: Universal Feed-Forward Metric 3D Reconstruction

cs.CV · 2025-09-16 · unverdicted · novelty 7.0

MapAnything is a unified feed-forward transformer that regresses metric 3D scene geometry and cameras from images using a factored representation of depth maps, ray maps, poses, and scale.

Learning Zero-Shot Subject-Driven Video Generation Using 1% Compute

cs.CV · 2025-04-23 · unverdicted · novelty 6.0

A zero-shot subject-driven video generation framework that decomposes the task into identity injection from 200K subject-image pairs and motion preservation from 4K arbitrary videos, trained in 288 A100 GPU hours on CogVideoX-5B to match prior performance at 1% compute.

citing papers explorer

Showing 3 of 3 citing papers.

FreeOrbit4D: Training-Free Arbitrary Camera Redirection for Monocular Videos via Foreground-Complete 4D Reconstruction cs.CV · 2026-01-26 · unverdicted · none · ref 49
FreeOrbit4D recovers a foreground-complete 4D proxy via decoupled background and object-centric reconstruction to provide geometric guidance for large-angle camera redirection in monocular videos using conditional video diffusion.
MapAnything: Universal Feed-Forward Metric 3D Reconstruction cs.CV · 2025-09-16 · unverdicted · none · ref 85
MapAnything is a unified feed-forward transformer that regresses metric 3D scene geometry and cameras from images using a factored representation of depth maps, ray maps, poses, and scale.
Learning Zero-Shot Subject-Driven Video Generation Using 1% Compute cs.CV · 2025-04-23 · unverdicted · none · ref 60
A zero-shot subject-driven video generation framework that decomposes the task into identity injection from 200K subject-image pairs and motion preservation from 4K arbitrary videos, trained in 288 A100 GPU hours on CogVideoX-5B to match prior performance at 1% compute.

Adding conditional control to text-to-image diffusion models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer