A multimodal diffusion model generates controllable alternative streetscapes from street-view imagery using visual metrics and text, shown on Chicago and Orlando data with gains in semantic consistency.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
VoxScene is a new anchor-conditioned voxel diffusion model that synthesizes collision-free 3D indoor scene arrangements via discrete volumetric occupancies and uses the grids for asset retrieval.
A deep learning model generates image-aware poster layouts that satisfy user-specified attribute constraints via Gaussian noise sampling and partial layout constraints via a dedicated loss and random mask, reaching state-of-the-art performance.
citing papers explorer
-
Designing streetscapes from street-view imagery using diffusion models
A multimodal diffusion model generates controllable alternative streetscapes from street-view imagery using visual metrics and text, shown on Chicago and Orlando data with gains in semantic consistency.
-
VoxScene: Anchor-Conditioned Voxel Diffusion for Indoor Scene Arrangement
VoxScene is a new anchor-conditioned voxel diffusion model that synthesizes collision-free 3D indoor scene arrangements via discrete volumetric occupancies and uses the grids for asset retrieval.
-
Image-aware Layout Generation with User Constraints for Poster Design
A deep learning model generates image-aware poster layouts that satisfy user-specified attribute constraints via Gaussian noise sampling and partial layout constraints via a dedicated loss and random mask, reaching state-of-the-art performance.