hub

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser · 2022

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

browse 11 citing papers

hub tools

JSON dossier citing papers JSON

representative citing papers

Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space

cs.LG · 2026-05-17 · unverdicted · novelty 7.0

Introduces adjoint-equation framework establishing dimension-free convergence bounds in any IPM for discrete diffusion models under masked and uniform priors.

Score-based Membership Inference on Diffusion Models

cs.LG · 2025-09-29 · unverdicted · novelty 7.0

Presents SimA, a score-based single-query membership inference attack for diffusion models and LDMs that uses denoiser output norm to reveal training set proximity and outperforms multi-query baselines on eight datasets.

AudioMoG: Guiding Audio Generation with Mixture-of-Guidance

cs.SD · 2025-09-28 · unverdicted · novelty 7.0

AudioMoG is a mixture-of-guidance sampling technique that combines CFG and AG signals to outperform single-guidance baselines in text-to-audio generation at equivalent speed.

Diffusion Models Are Real-Time Game Engines

cs.LG · 2024-08-27 · conditional · novelty 7.0

A diffusion model trained on DOOM play sessions generates stable real-time interactive game frames at 20 FPS with quality near lossy JPEG.

From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation

cs.LG · 2026-03-12 · unverdicted · novelty 6.0

EG-GRPO improves autoregressive text-to-image models by reallocating RL updates according to token entropy, excluding low-entropy tokens from reward signals while adding entropy bonuses to high-entropy ones, yielding state-of-the-art results on standard benchmarks.

SeMoBridge: Semantic Modality Bridge for Efficient Few-Shot Adaptation of CLIP

cs.CV · 2025-09-30 · unverdicted · novelty 6.0

SeMoBridge projects images into the text modality via a semantic bridge to reduce CLIP's intra-modal misalignment and improve few-shot performance.

Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation

cs.CV · 2025-09-29 · unverdicted · novelty 6.0 · 2 refs

Causal-Adapter adapts frozen diffusion backbones via structural causal modeling, prompt-aligned injection, and conditioned token contrastive loss to achieve faithful counterfactual generation with strong attribute control and identity preservation.

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

cs.CV · 2024-08-12 · unverdicted · novelty 6.0

CogVideoX generates coherent 10-second text-to-video outputs at high resolution using a 3D VAE, expert adaptive LayerNorm transformer, progressive training, and a custom data pipeline, claiming state-of-the-art results.

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

cs.CV · 2024-04-02 · unverdicted · novelty 6.0

CameraCtrl enables accurate camera pose control in video diffusion models through a trained plug-and-play module and dataset choices emphasizing diverse camera trajectories with matching appearance.

ConjNorm: Tractable Density Estimation for Out-of-Distribution Detection

cs.LG · 2024-02-27 · unverdicted · novelty 6.0

ConjNorm reframes OOD detection score design as optimizing norm p in an exponential family density model via a Bregman divergence theorem, with a tractable Monte Carlo estimator, claiming SOTA gains on CIFAR-100 and ImageNet-1K.

Temporal Aware Pruning for Efficient Diffusion-based Video Generation

cs.CV · 2026-05-18 · unverdicted · novelty 5.0

TAPE applies temporal-aware token pruning with smoothing, reselection, and timestep scheduling to speed up video diffusion models while preserving visual fidelity and coherence.

citing papers explorer

Showing 11 of 11 citing papers.

Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space cs.LG · 2026-05-17 · unverdicted · none · ref 89
Introduces adjoint-equation framework establishing dimension-free convergence bounds in any IPM for discrete diffusion models under masked and uniform priors.
Score-based Membership Inference on Diffusion Models cs.LG · 2025-09-29 · unverdicted · none · ref 34
Presents SimA, a score-based single-query membership inference attack for diffusion models and LDMs that uses denoiser output norm to reveal training set proximity and outperforms multi-query baselines on eight datasets.
AudioMoG: Guiding Audio Generation with Mixture-of-Guidance cs.SD · 2025-09-28 · unverdicted · none · ref 61
AudioMoG is a mixture-of-guidance sampling technique that combines CFG and AG signals to outperform single-guidance baselines in text-to-audio generation at equivalent speed.
Diffusion Models Are Real-Time Game Engines cs.LG · 2024-08-27 · conditional · none · ref 83
A diffusion model trained on DOOM play sessions generates stable real-time interactive game frames at 20 FPS with quality near lossy JPEG.
From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation cs.LG · 2026-03-12 · unverdicted · none · ref 23
EG-GRPO improves autoregressive text-to-image models by reallocating RL updates according to token entropy, excluding low-entropy tokens from reward signals while adding entropy bonuses to high-entropy ones, yielding state-of-the-art results on standard benchmarks.
SeMoBridge: Semantic Modality Bridge for Efficient Few-Shot Adaptation of CLIP cs.CV · 2025-09-30 · unverdicted · none · ref 18
SeMoBridge projects images into the text modality via a semantic bridge to reduce CLIP's intra-modal misalignment and improve few-shot performance.
Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation cs.CV · 2025-09-29 · unverdicted · none · ref 45 · 2 links
Causal-Adapter adapts frozen diffusion backbones via structural causal modeling, prompt-aligned injection, and conditioned token contrastive loss to achieve faithful counterfactual generation with strong attribute control and identity preservation.
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer cs.CV · 2024-08-12 · unverdicted · none · ref 93
CogVideoX generates coherent 10-second text-to-video outputs at high resolution using a 3D VAE, expert adaptive LayerNorm transformer, progressive training, and a custom data pipeline, claiming state-of-the-art results.
CameraCtrl: Enabling Camera Control for Text-to-Video Generation cs.CV · 2024-04-02 · unverdicted · none · ref 142
CameraCtrl enables accurate camera pose control in video diffusion models through a trained plug-and-play module and dataset choices emphasizing diverse camera trajectories with matching appearance.
ConjNorm: Tractable Density Estimation for Out-of-Distribution Detection cs.LG · 2024-02-27 · unverdicted · none · ref 58
ConjNorm reframes OOD detection score design as optimizing norm p in an exponential family density model via a Bregman divergence theorem, with a tractable Monte Carlo estimator, claiming SOTA gains on CIFAR-100 and ImageNet-1K.
Temporal Aware Pruning for Efficient Diffusion-based Video Generation cs.CV · 2026-05-18 · unverdicted · none · ref 31
TAPE applies temporal-aware token pruning with smoothing, reselection, and timestep scheduling to speed up video diffusion models while preserving visual fidelity and coherence.

High-resolution image synthesis with latent diffusion models

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer