hub

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Bj ¨orn Ommer · 2022

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

browse 11 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

A unified Benchmark for Multi-Frame Image Restoration under Severe Refractive Warping

cs.CV · 2026-05-06 · unverdicted · novelty 7.0

Presents the first large-scale benchmark for multi-frame geometric distortion removal in videos under severe refractive warping, using real and synthetic data across four distortion levels and evaluating classical and learning-based methods including a proposed diffusion-based V-cache.

Iterative Inference-time Scaling with Adaptive Frequency Steering for Image Super-Resolution

cs.CV · 2025-12-29 · unverdicted · novelty 7.0

IAFS is a training-free iterative inference-time scaling framework that uses adaptive frequency-aware particle fusion to resolve the perception-fidelity conflict in diffusion super-resolution models, outperforming prior scaling strategies.

See the past: Time-Reversed Scene Reconstruction from Thermal Traces Using Visual Language Models

cs.CV · 2025-10-06 · unverdicted · novelty 7.0

A time-reversed reconstruction method couples visual language models with constrained diffusion to generate past scene frames from current thermal traces in controlled scenarios.

The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents

cs.CV · 2026-04-28 · unverdicted · novelty 6.0

A recursive sparse MoE framework integrated into diffusion models iteratively refines visual tokens via gated module selection to improve structured reasoning and image generation performance.

Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation

cs.CV · 2026-04-23 · unverdicted · novelty 6.0

Synthetic data complements real data in diffusion-based controllable human video generation, with effective sample selection improving motion realism, temporal consistency, and identity preservation.

Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving

cs.RO · 2026-02-26 · unverdicted · novelty 6.0

The paper introduces Hyper Diffusion Planner (HDP), a diffusion-based E2E AD framework that identifies insights on loss space, trajectory representation and data scaling, adds RL post-training, and reports 10x performance gains over 200 km of real-world testing across 6 scenarios.

Are Video Models Emerging as Zero-Shot Learners and Reasoners in Medical Imaging?

cs.CV · 2025-10-11 · unverdicted · novelty 6.0

A video-trained large vision model achieves competitive zero-shot performance on organ segmentation, denoising, super-resolution, and 4D CT motion prediction in medical imaging, outperforming some specialized baselines on patient data from 122 cases.

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

cs.LG · 2024-10-31 · unverdicted · novelty 6.0

π₀ is a vision-language-action flow model trained on diverse multi-platform robot data that supports zero-shot task performance, language instruction following, and efficient fine-tuning for dexterous tasks.

Multimodal synthesis of MRI and tabular data with diffusion in a joint latent space via cross-attention

eess.IV · 2026-05-05 · unverdicted · novelty 5.0

A latent diffusion model jointly synthesizes MRI volumes and mixed-type tabular clinical data in a shared space via cross-attention and separate decoders after VAE fusion.

InSpatio-WorldFM: An Open-Source Real-Time Generative Frame Model

cs.CV · 2026-03-12 · unverdicted · novelty 5.0

InSpatio-WorldFM is a frame-independent generative model that uses explicit 3D anchors and spatial memory to deliver real-time multi-view consistent spatial intelligence via a three-stage training pipeline from pretrained diffusion models.

Selective Aggregation of Attention Maps Improves Diffusion-Based Visual Interpretation

cs.CV · 2026-04-07 · unverdicted · novelty 4.0

Selective aggregation of cross-attention maps from the most relevant heads in diffusion-based T2I models yields higher mean IoU for visual interpretation than standard aggregation methods like DAAM.

citing papers explorer

Showing 11 of 11 citing papers.

A unified Benchmark for Multi-Frame Image Restoration under Severe Refractive Warping cs.CV · 2026-05-06 · unverdicted · none · ref 25
Presents the first large-scale benchmark for multi-frame geometric distortion removal in videos under severe refractive warping, using real and synthetic data across four distortion levels and evaluating classical and learning-based methods including a proposed diffusion-based V-cache.
Iterative Inference-time Scaling with Adaptive Frequency Steering for Image Super-Resolution cs.CV · 2025-12-29 · unverdicted · none · ref 27
IAFS is a training-free iterative inference-time scaling framework that uses adaptive frequency-aware particle fusion to resolve the perception-fidelity conflict in diffusion super-resolution models, outperforming prior scaling strategies.
See the past: Time-Reversed Scene Reconstruction from Thermal Traces Using Visual Language Models cs.CV · 2025-10-06 · unverdicted · none · ref 20
A time-reversed reconstruction method couples visual language models with constrained diffusion to generate past scene frames from current thermal traces in controlled scenarios.
The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents cs.CV · 2026-04-28 · unverdicted · none · ref 12
A recursive sparse MoE framework integrated into diffusion models iteratively refines visual tokens via gated module selection to improve structured reasoning and image generation performance.
Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation cs.CV · 2026-04-23 · unverdicted · none · ref 28
Synthetic data complements real data in diffusion-based controllable human video generation, with effective sample selection improving motion realism, temporal consistency, and identity preservation.
Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving cs.RO · 2026-02-26 · unverdicted · none · ref 44
The paper introduces Hyper Diffusion Planner (HDP), a diffusion-based E2E AD framework that identifies insights on loss space, trajectory representation and data scaling, adds RL post-training, and reports 10x performance gains over 200 km of real-world testing across 6 scenarios.
Are Video Models Emerging as Zero-Shot Learners and Reasoners in Medical Imaging? cs.CV · 2025-10-11 · unverdicted · none · ref 25
A video-trained large vision model achieves competitive zero-shot performance on organ segmentation, denoising, super-resolution, and 4D CT motion prediction in medical imaging, outperforming some specialized baselines on patient data from 122 cases.
$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control cs.LG · 2024-10-31 · unverdicted · none · ref 40
π₀ is a vision-language-action flow model trained on diverse multi-platform robot data that supports zero-shot task performance, language instruction following, and efficient fine-tuning for dexterous tasks.
Multimodal synthesis of MRI and tabular data with diffusion in a joint latent space via cross-attention eess.IV · 2026-05-05 · unverdicted · none · ref 20
A latent diffusion model jointly synthesizes MRI volumes and mixed-type tabular clinical data in a shared space via cross-attention and separate decoders after VAE fusion.
InSpatio-WorldFM: An Open-Source Real-Time Generative Frame Model cs.CV · 2026-03-12 · unverdicted · none · ref 27
InSpatio-WorldFM is a frame-independent generative model that uses explicit 3D anchors and spatial memory to deliver real-time multi-view consistent spatial intelligence via a three-stage training pipeline from pretrained diffusion models.
Selective Aggregation of Attention Maps Improves Diffusion-Based Visual Interpretation cs.CV · 2026-04-07 · unverdicted · none · ref 10
Selective aggregation of cross-attention maps from the most relevant heads in diffusion-based T2I models yields higher mean IoU for visual interpretation than standard aggregation methods like DAAM.

High-resolution image synthesis with latent diffusion models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer