AHPA adaptively aligns diffusion transformers to hierarchical VAE priors via a dynamic router that matches supervision granularity to the current noise level, improving convergence and quality.
Sdxl: Improving latent diffusion models for high-resolution image synthesis, 2023
6 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 6representative citing papers
SteeringDiffusion supplies a bottlenecked, prompt-conditioned activation interface for frozen diffusion models that delivers smooth monotonic content-style control via one runtime scalar and timestep gating.
SCOPE adds per-pixel action conditioning to pretrained video diffusion models and releases the CrossFPS multi-game dataset to support cross-game FPS world model simulation with zero-shot transfer.
JFDL allows pre-trained Consistency Models to perform guided image generation post-hoc by aligning flow distributions, reducing FID scores on CIFAR-10 and ImageNet without needing a teacher model.
EAM is a DiT-based blind super-resolution model that uses a triple-flow Ψ-DiT block, progressive masked image modeling, and in-context subject-aware prompting to reach state-of-the-art quantitative and visual results on standard datasets.
Optimizing the noise schedule, preparing a balanced bucketed dataset, and aligning outputs with human preferences enables Playground v2.5 to reach state-of-the-art aesthetic quality across aspect ratios.
citing papers explorer
-
AHPA: Adaptive Hierarchical Prior Alignment for Diffusion Transformers
AHPA adaptively aligns diffusion transformers to hierarchical VAE priors via a dynamic router that matches supervision granularity to the current noise level, improving convergence and quality.
-
SteeringDiffusion: A Bottlenecked Activation Control Interface for Diffusion Models
SteeringDiffusion supplies a bottlenecked, prompt-conditioned activation interface for frozen diffusion models that delivers smooth monotonic content-style control via one runtime scalar and timestep gating.
-
SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models
SCOPE adds per-pixel action conditioning to pretrained video diffusion models and releases the CrossFPS multi-game dataset to support cross-game FPS world model simulation with zero-shot transfer.
-
Post-Hoc Guidance for Consistency Models by Joint Flow Distribution Learning
JFDL allows pre-trained Consistency Models to perform guided image generation post-hoc by aligning flow distributions, reducing FID scores on CIFAR-10 and ImageNet without needing a teacher model.
-
EAM: Enhancing Anything with Diffusion Transformers for Blind Super-Resolution
EAM is a DiT-based blind super-resolution model that uses a triple-flow Ψ-DiT block, progressive masked image modeling, and in-context subject-aware prompting to reach state-of-the-art quantitative and visual results on standard datasets.
-
Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation
Optimizing the noise schedule, preparing a balanced bucketed dataset, and aligning outputs with human preferences enables Playground v2.5 to reach state-of-the-art aesthetic quality across aspect ratios.