AHPA adaptively aligns diffusion transformers to hierarchical VAE priors via a dynamic router that matches supervision granularity to the current noise level, improving convergence and quality.
hub
High-resolution image synthesis with latent diffusion models, 2022
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
SteeringDiffusion supplies a bottlenecked, prompt-conditioned activation interface for frozen diffusion models that delivers smooth monotonic content-style control via one runtime scalar and timestep gating.
KVBench reveals major gaps in current T2I models for knowledge-intensive tasks, and KE-Check narrows the gap between open- and closed-source models by adding structured knowledge and enforcing constraints.
FaSTA* combines LLM fast planning with A* search and inductive subroutine mining to create an efficient agent for multi-turn image editing tasks.
Fast-dLLM adds reusable KV cache blocks and selective parallel decoding to diffusion LLMs, closing most of the speed gap with autoregressive models without retraining.
MaSC is a masked similarity metric that decomposes concept-driven image generation evaluation into subject-specific preservation and background-based prompt following using SigLIP2 embeddings, outperforming global baselines on human correlation and identity benchmarks.
Mean-Variance Split residuals separate centered variation from mean updates to prevent collapse and enable stable training of 1000-layer Diffusion Transformers.
A reusable architecture for joint spatiotemporal super-resolution of precipitation that adapts to scaling factors from 1-25 in space and 1-6 in time via hyperparameter retuning and optional mass conservation.
EAM is a DiT-based blind super-resolution model that uses a triple-flow Ψ-DiT block, progressive masked image modeling, and in-context subject-aware prompting to reach state-of-the-art quantitative and visual results on standard datasets.
A generative pipeline creates realistic synthetic pitting defects and other surface flaws that, when added to real training data, yield modest gains in industrial defect detectors without replacing the need for authentic samples.
Optimizing the noise schedule, preparing a balanced bucketed dataset, and aligning outputs with human preferences enables Playground v2.5 to reach state-of-the-art aesthetic quality across aspect ratios.
citing papers explorer
-
AHPA: Adaptive Hierarchical Prior Alignment for Diffusion Transformers
AHPA adaptively aligns diffusion transformers to hierarchical VAE priors via a dynamic router that matches supervision granularity to the current noise level, improving convergence and quality.
-
SteeringDiffusion: A Bottlenecked Activation Control Interface for Diffusion Models
SteeringDiffusion supplies a bottlenecked, prompt-conditioned activation interface for frozen diffusion models that delivers smooth monotonic content-style control via one runtime scalar and timestep gating.
-
Knowledge Visualization: A Benchmark and Method for Knowledge-Intensive Text-to-Image Generation
KVBench reveals major gaps in current T2I models for knowledge-intensive tasks, and KE-Check narrows the gap between open- and closed-source models by adding structured knowledge and enforcing constraints.
-
FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing
FaSTA* combines LLM fast planning with A* search and inductive subroutine mining to create an efficient agent for multi-turn image editing tasks.
-
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
Fast-dLLM adds reusable KV cache blocks and selective parallel decoding to diffusion LLMs, closing most of the speed gap with autoregressive models without retraining.
-
MaSC: A Masked Similarity Metric for Evaluating Concept-Driven Generation
MaSC is a masked similarity metric that decomposes concept-driven image generation evaluation into subject-specific preservation and background-based prompt following using SigLIP2 embeddings, outperforming global baselines on human correlation and identity benchmarks.
-
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers
Mean-Variance Split residuals separate centered variation from mean updates to prevent collapse and enable stable training of 1000-layer Diffusion Transformers.
-
A Scale-Adaptive Framework for Joint Spatiotemporal Super-Resolution with Diffusion Models
A reusable architecture for joint spatiotemporal super-resolution of precipitation that adapts to scaling factors from 1-25 in space and 1-6 in time via hyperparameter retuning and optional mass conservation.
-
EAM: Enhancing Anything with Diffusion Transformers for Blind Super-Resolution
EAM is a DiT-based blind super-resolution model that uses a triple-flow Ψ-DiT block, progressive masked image modeling, and in-context subject-aware prompting to reach state-of-the-art quantitative and visual results on standard datasets.
-
SynSur: An end-to-end generative pipeline for synthetic industrial surface defect generation and detection
A generative pipeline creates realistic synthetic pitting defects and other surface flaws that, when added to real training data, yield modest gains in industrial defect detectors without replacing the need for authentic samples.
-
Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation
Optimizing the noise schedule, preparing a balanced bucketed dataset, and aligning outputs with human preferences enables Playground v2.5 to reach state-of-the-art aesthetic quality across aspect ratios.