Sdxl: Improving latent diffusion models for high-resolution image synthesis, 2023

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, Robin Rombach · 2023

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

representative citing papers

AHPA: Adaptive Hierarchical Prior Alignment for Diffusion Transformers

cs.CV · 2026-05-05 · unverdicted · novelty 7.0

AHPA adaptively aligns diffusion transformers to hierarchical VAE priors via a dynamic router that matches supervision granularity to the current noise level, improving convergence and quality.

SteeringDiffusion: A Bottlenecked Activation Control Interface for Diffusion Models

cs.CV · 2026-05-03 · unverdicted · novelty 7.0

SteeringDiffusion supplies a bottlenecked, prompt-conditioned activation interface for frozen diffusion models that delivers smooth monotonic content-style control via one runtime scalar and timestep gating.

SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models

cs.CV · 2026-05-22 · unverdicted · novelty 6.0

SCOPE adds per-pixel action conditioning to pretrained video diffusion models and releases the CrossFPS multi-game dataset to support cross-game FPS world model simulation with zero-shot transfer.

Post-Hoc Guidance for Consistency Models by Joint Flow Distribution Learning

cs.LG · 2026-04-10 · unverdicted · novelty 6.0

JFDL allows pre-trained Consistency Models to perform guided image generation post-hoc by aligning flow distributions, reducing FID scores on CIFAR-10 and ImageNet without needing a teacher model.

EAM: Enhancing Anything with Diffusion Transformers for Blind Super-Resolution

cs.CV · 2025-05-08 · unverdicted · novelty 6.0

EAM is a DiT-based blind super-resolution model that uses a triple-flow Ψ-DiT block, progressive masked image modeling, and in-context subject-aware prompting to reach state-of-the-art quantitative and visual results on standard datasets.

Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation

cs.CV · 2024-02-27 · unverdicted · novelty 4.0

Optimizing the noise schedule, preparing a balanced bucketed dataset, and aligning outputs with human preferences enables Playground v2.5 to reach state-of-the-art aesthetic quality across aspect ratios.

citing papers explorer

Showing 6 of 6 citing papers.

AHPA: Adaptive Hierarchical Prior Alignment for Diffusion Transformers cs.CV · 2026-05-05 · unverdicted · none · ref 24
AHPA adaptively aligns diffusion transformers to hierarchical VAE priors via a dynamic router that matches supervision granularity to the current noise level, improving convergence and quality.
SteeringDiffusion: A Bottlenecked Activation Control Interface for Diffusion Models cs.CV · 2026-05-03 · unverdicted · none · ref 49
SteeringDiffusion supplies a bottlenecked, prompt-conditioned activation interface for frozen diffusion models that delivers smooth monotonic content-style control via one runtime scalar and timestep gating.
SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models cs.CV · 2026-05-22 · unverdicted · none · ref 40
SCOPE adds per-pixel action conditioning to pretrained video diffusion models and releases the CrossFPS multi-game dataset to support cross-game FPS world model simulation with zero-shot transfer.
Post-Hoc Guidance for Consistency Models by Joint Flow Distribution Learning cs.LG · 2026-04-10 · unverdicted · none · ref 47
JFDL allows pre-trained Consistency Models to perform guided image generation post-hoc by aligning flow distributions, reducing FID scores on CIFAR-10 and ImageNet without needing a teacher model.
EAM: Enhancing Anything with Diffusion Transformers for Blind Super-Resolution cs.CV · 2025-05-08 · unverdicted · none · ref 24
EAM is a DiT-based blind super-resolution model that uses a triple-flow Ψ-DiT block, progressive masked image modeling, and in-context subject-aware prompting to reach state-of-the-art quantitative and visual results on standard datasets.
Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation cs.CV · 2024-02-27 · unverdicted · none · ref 28
Optimizing the noise schedule, preparing a balanced bucketed dataset, and aligning outputs with human preferences enables Playground v2.5 to reach state-of-the-art aesthetic quality across aspect ratios.

Sdxl: Improving latent diffusion models for high-resolution image synthesis, 2023

fields

years

verdicts

representative citing papers

citing papers explorer