hub

High-resolution image synthesis with latent diffusion models, 2022

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer · 2022

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

browse 11 citing papers

hub tools

JSON dossier citing papers JSON

representative citing papers

AHPA: Adaptive Hierarchical Prior Alignment for Diffusion Transformers

cs.CV · 2026-05-05 · unverdicted · novelty 7.0

AHPA adaptively aligns diffusion transformers to hierarchical VAE priors via a dynamic router that matches supervision granularity to the current noise level, improving convergence and quality.

SteeringDiffusion: A Bottlenecked Activation Control Interface for Diffusion Models

cs.CV · 2026-05-03 · unverdicted · novelty 7.0

SteeringDiffusion supplies a bottlenecked, prompt-conditioned activation interface for frozen diffusion models that delivers smooth monotonic content-style control via one runtime scalar and timestep gating.

Knowledge Visualization: A Benchmark and Method for Knowledge-Intensive Text-to-Image Generation

cs.CV · 2026-04-24 · conditional · novelty 7.0

KVBench reveals major gaps in current T2I models for knowledge-intensive tasks, and KE-Check narrows the gap between open- and closed-source models by adding structured knowledge and enforcing constraints.

FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing

cs.CV · 2025-06-26 · unverdicted · novelty 7.0

FaSTA* combines LLM fast planning with A* search and inductive subroutine mining to create an efficient agent for multi-turn image editing tasks.

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

cs.CL · 2025-05-28 · conditional · novelty 7.0

Fast-dLLM adds reusable KV cache blocks and selective parallel decoding to diffusion LLMs, closing most of the speed gap with autoregressive models without retraining.

MaSC: A Masked Similarity Metric for Evaluating Concept-Driven Generation

cs.CV · 2026-05-21 · unverdicted · novelty 6.0

MaSC is a masked similarity metric that decomposes concept-driven image generation evaluation into subject-specific preservation and background-based prompt following using SigLIP2 embeddings, outperforming global baselines on human correlation and identity benchmarks.

Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Mean-Variance Split residuals separate centered variation from mean updates to prevent collapse and enable stable training of 1000-layer Diffusion Transformers.

A Scale-Adaptive Framework for Joint Spatiotemporal Super-Resolution with Diffusion Models

cs.LG · 2026-04-23 · unverdicted · novelty 6.0

A reusable architecture for joint spatiotemporal super-resolution of precipitation that adapts to scaling factors from 1-25 in space and 1-6 in time via hyperparameter retuning and optional mass conservation.

EAM: Enhancing Anything with Diffusion Transformers for Blind Super-Resolution

cs.CV · 2025-05-08 · unverdicted · novelty 6.0

EAM is a DiT-based blind super-resolution model that uses a triple-flow Ψ-DiT block, progressive masked image modeling, and in-context subject-aware prompting to reach state-of-the-art quantitative and visual results on standard datasets.

SynSur: An end-to-end generative pipeline for synthetic industrial surface defect generation and detection

cs.CV · 2026-04-29 · unverdicted · novelty 4.0

A generative pipeline creates realistic synthetic pitting defects and other surface flaws that, when added to real training data, yield modest gains in industrial defect detectors without replacing the need for authentic samples.

Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation

cs.CV · 2024-02-27 · unverdicted · novelty 4.0

Optimizing the noise schedule, preparing a balanced bucketed dataset, and aligning outputs with human preferences enables Playground v2.5 to reach state-of-the-art aesthetic quality across aspect ratios.

citing papers explorer

Showing 11 of 11 citing papers.

AHPA: Adaptive Hierarchical Prior Alignment for Diffusion Transformers cs.CV · 2026-05-05 · unverdicted · none · ref 27
AHPA adaptively aligns diffusion transformers to hierarchical VAE priors via a dynamic router that matches supervision granularity to the current noise level, improving convergence and quality.
SteeringDiffusion: A Bottlenecked Activation Control Interface for Diffusion Models cs.CV · 2026-05-03 · unverdicted · none · ref 4
SteeringDiffusion supplies a bottlenecked, prompt-conditioned activation interface for frozen diffusion models that delivers smooth monotonic content-style control via one runtime scalar and timestep gating.
Knowledge Visualization: A Benchmark and Method for Knowledge-Intensive Text-to-Image Generation cs.CV · 2026-04-24 · conditional · none · ref 54
KVBench reveals major gaps in current T2I models for knowledge-intensive tasks, and KE-Check narrows the gap between open- and closed-source models by adding structured knowledge and enforcing constraints.
FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing cs.CV · 2025-06-26 · unverdicted · none · ref 33
FaSTA* combines LLM fast planning with A* search and inductive subroutine mining to create an efficient agent for multi-turn image editing tasks.
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding cs.CL · 2025-05-28 · conditional · none · ref 25
Fast-dLLM adds reusable KV cache blocks and selective parallel decoding to diffusion LLMs, closing most of the speed gap with autoregressive models without retraining.
MaSC: A Masked Similarity Metric for Evaluating Concept-Driven Generation cs.CV · 2026-05-21 · unverdicted · none · ref 24
MaSC is a masked similarity metric that decomposes concept-driven image generation evaluation into subject-specific preservation and background-based prompt following using SigLIP2 embeddings, outperforming global baselines on human correlation and identity benchmarks.
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers cs.LG · 2026-05-07 · unverdicted · none · ref 11
Mean-Variance Split residuals separate centered variation from mean updates to prevent collapse and enable stable training of 1000-layer Diffusion Transformers.
A Scale-Adaptive Framework for Joint Spatiotemporal Super-Resolution with Diffusion Models cs.LG · 2026-04-23 · unverdicted · none · ref 25
A reusable architecture for joint spatiotemporal super-resolution of precipitation that adapts to scaling factors from 1-25 in space and 1-6 in time via hyperparameter retuning and optional mass conservation.
EAM: Enhancing Anything with Diffusion Transformers for Blind Super-Resolution cs.CV · 2025-05-08 · unverdicted · none · ref 25
EAM is a DiT-based blind super-resolution model that uses a triple-flow Ψ-DiT block, progressive masked image modeling, and in-context subject-aware prompting to reach state-of-the-art quantitative and visual results on standard datasets.
SynSur: An end-to-end generative pipeline for synthetic industrial surface defect generation and detection cs.CV · 2026-04-29 · unverdicted · none · ref 34
A generative pipeline creates realistic synthetic pitting defects and other surface flaws that, when added to real training data, yield modest gains in industrial defect detectors without replacing the need for authentic samples.
Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation cs.CV · 2024-02-27 · unverdicted · none · ref 29
Optimizing the noise schedule, preparing a balanced bucketed dataset, and aligning outputs with human preferences enables Playground v2.5 to reach state-of-the-art aesthetic quality across aspect ratios.

High-resolution image synthesis with latent diffusion models, 2022

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer