Scalable diffusion models with transformers

William Peebles, Saining Xie · 2023

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models

cs.CV · 2026-01-07 · unverdicted · novelty 7.0 · 2 refs

LocalDPO aligns text-to-video diffusion models with human preferences at the spatio-temporal region level by automatically generating localized preference pairs from corrupted real videos and applying a region-aware DPO loss.

DreamShot: Personalized Storyboard Synthesis with Video Diffusion Prior

cs.CV · 2026-04-19 · unverdicted · novelty 6.0

DreamShot uses video diffusion priors and a role-attention consistency loss to produce coherent, personalized storyboards with better character and scene continuity than text-to-image methods.

ProPhy: Progressive Physical Alignment for Dynamic World Simulation

cs.CV · 2025-12-05 · unverdicted · novelty 6.0

ProPhy adds explicit physics-aware conditioning via semantic and refinement experts plus VLM knowledge transfer to produce more physically coherent dynamic videos than prior methods.

Training-free, Perceptually Consistent Low-Resolution Previews with High-Resolution Image for Efficient Workflows of Diffusion Models

eess.IV · 2026-04-10 · unverdicted · novelty 5.0

A commutator-zero condition enables training-free generation of perceptually consistent low-resolution previews for high-resolution diffusion model outputs, achieving up to 33% computation reduction.

citing papers explorer

Showing 4 of 4 citing papers.

Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models cs.CV · 2026-01-07 · unverdicted · none · ref 41 · 2 links
LocalDPO aligns text-to-video diffusion models with human preferences at the spatio-temporal region level by automatically generating localized preference pairs from corrupted real videos and applying a region-aware DPO loss.
DreamShot: Personalized Storyboard Synthesis with Video Diffusion Prior cs.CV · 2026-04-19 · unverdicted · none · ref 29
DreamShot uses video diffusion priors and a role-attention consistency loss to produce coherent, personalized storyboards with better character and scene continuity than text-to-image methods.
ProPhy: Progressive Physical Alignment for Dynamic World Simulation cs.CV · 2025-12-05 · unverdicted · none · ref 22
ProPhy adds explicit physics-aware conditioning via semantic and refinement experts plus VLM knowledge transfer to produce more physically coherent dynamic videos than prior methods.
Training-free, Perceptually Consistent Low-Resolution Previews with High-Resolution Image for Efficient Workflows of Diffusion Models eess.IV · 2026-04-10 · unverdicted · none · ref 36
A commutator-zero condition enables training-free generation of perceptually consistent low-resolution previews for high-resolution diffusion model outputs, achieving up to 33% computation reduction.

Scalable diffusion models with transformers

fields

years

verdicts

representative citing papers

citing papers explorer